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Abstract 


Certain  tasks,  such  as  formal  program  development  and  theorem  proving,  fundamen¬ 
tally  rely  upon  the  manipulation  of  higher-order  objects  such  as  functions  and  pred¬ 
icates.  Computing  tools  intended  to  assist  in  performing  these  tasks  are  at  present 
inadequate  in  both  the  amount  of  ‘knowledge’'  they  contain  (i.e.,  the  level  of  support 
they  provide)  and  in  their  ability  to  ‘learn’  (i.e.,  their  capacity  to  enhance  that  support 
over  time).  The  application  of  a  relevant  machine  learning  technique  —  explanation- 
based  generalization  (EBG)  —  has  thus  far  been  limited  to  first-order  problem  rep¬ 
resentations.  We  extend  EBG  to  generalize  higher-order  values,  thereby  enabling  its 
application  to  higher-order  problem  encodings. 

Logic  programming  provides  a  uniform  framework  in  which  all  aspects  of  explanation- 
based  generalization  and  learning  may  be  defined  and  carried  out.  First-order  Horn 
logics  ( e.g .,  Prolog)  are  not,  however,  well  suited  to  higher-order  applications.  Instead, 
we  employ  XProlog,  a  higher-order  logic  programming  language,  as  our  basic  framework 
for  realizing  higher-order  EBG.  In  order  to  capture  the  distinction  between  domain 
theory  and  training  instance  upon  which  EBG  relies,  we  extend  AProlog  with  the 
necessity  operator  □  of  modal  logic.  We  then  provide  a  formal  characterization  of  both 
the  extended  logic,  Xa  Prolog,  and  of  higher-order  EBG  over  Aa Prolog  computation. 
We  also  illustrate  applications  of  higher-order  EBG  within  program  development  and 
theorem  proving. 

Within  the  architectures  of  traditional  learning  systems,  the  language  for  problem 
representation  and  solution  (i.e.,  the  programming  language)  is  separated  from  the 
underlying  learning  mechanism.  Herein  we  propose  an  alternative  paradigm  in  which 
generalization  and  assimilation  are  realized  through  integrated  features  of  the  program¬ 
ming  language,  and  are  therefore  under  programmer  control.  In  this  way,  the  developer 
can  leverage  domain  knowledge  and  provision  for  user  interaction  in  the  programming 
of  learning  tasks.  Thus,  while  A^  Prolog  —  the  logic  extended  with  generalization  and 
assimilation  features  —  is  not  itself  a  learning  system,  it  is  intended  to  serve  as  a 
flexible,  high-level  foundation  for  the  construction  of  such  systems. 

For  A^Prolog  to  afford  this  programmable  learning,  constructs  are  necessary  for  con¬ 
trolling  generalization,  and  for  assimilating  the  results  of  generalization  within  the 
logic  program.  The  problem  with  the  standard  means  by  which  Prolog  programs  are 
extended  —  assert  —  is  that  the  construct  is  not  semantically  well-behaved.  A  more 
elegant  alternative  (adopted,  for  example,  in  AProlog)  is  implication  with  its  intu- 
itionistic  meaning,  but  the  assumptions  so  added  to  a  logic  program  are  of  limited 
applicability.  We  propose  a  new  construct  rule,  which  combines  the  declarative  se¬ 
mantics  of  implication  with  some  of  the  power  of  assert.  Operationally,  rule  provides 
for  the  extension  of  the  logic  program  with  results  that  deductively  follow  from  that 
program.  We  then  extend  rule  to  address  explanation-based  generalization  within 
another  new  construct,  rule.ebg.  While  rule  and  rule_ebg  are  developed  in  the 
framework  of  AProlog,  the  underlying  ideas  are  genera!,  and  therefore  applicable  to 
other  logic  programming  languages. 

In  addition  to  developing  and  formally  characterizing  the  A^  Prolog  language,  this  thesis 
also  provides  a  prototype  implementation  and  numerous  examples. 
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Chapter  1 
Introduction 


Broadly  speaking,  this  thesis  should  be  viewed  as  a  language  design  effort.  Rather  than 
starting  from  scratch,  this  effort  considers  extensions  to  the  higher-order  logic  programming 
language  AProlog  [100],  itself  the  subject  of  current  research.  These  enhancements  focus 
upon  the  incorporation  of  generalization  and  learning,  and  in  particular,  explanation-based 
generalization  (EBG)  and  learning  (EBL),  within  the  framework  of  AProlog.  While  learning 
is  central  to  this  work,  the  dissertation  does  not  contain  any  demonstrations  of  performance 
improvement,  such  as  through  timing  evaluations  or  learning  curves.  This  is  because  the 
focus  of  this  thesis  is  not  a  ‘stand  alone’  problem  solver  that  learns.  Rather,  it  is  a  program¬ 
ming  language  —  A£  Prolog. 

The  preceding  distinction  is  fundamentally  important  to  the  evaluation  of  this  work:  First, 
unlike  typical  learning  systems,  A£ Prolog  does  not  pose  its  own  learning  problems.  Instead, 
A° Prolog  incorporates  constructs  that  provide  for  programmable  generalization  and  assimi¬ 
lation.  By  integrating  learning  mechanisms  within  the  programming  language,  we  defer  one 
of  the  more  difficult  problems  faced  by  a  ‘learner’:  determining  over  what  computations  to 
attempt  learning,  or  in  other  words,  determining  when  to  learn.  Our  approach  allows  the 
programmer  (or  client )  to  explicitly  control  learning  within  the  same  language  as  that  in 
which  the  problem  is  encoded.  We  cla;m  that  it  is  the  client  which  can  best  coordinate 
learning,  as  he  is  in  the  best  position  to  leverage  domain  knowledge  and  user-interaction. 
Although  A£ Prolog  is  not  itself  a  learning  system,  it  is  intended  to  serve  as  a  high-level 
foundation  for  the  implementation  of  such  systems. 

This  thesis,  however,  embodies  more  than  just  a  novel  approach  to  the  formulation  of  learning 
tasks  within  logic  programs:  we  herein  extend  and  reformulate  the  paradigm  of  EBG,  and 
moreover,  develop  semantically  straightforward  means  by  which  EBG  and  more  limited 
forms  of  generalization  can  be  integrated  within  the  logic  programming  framework.  Some 
background  is  in  order. 

Higher-order  representation  languages.  We  distinguish  the  representation  language  to 
be  the  subset  of  a  programming  language  concerned  with  the  encoding  (or  representation) 
of  data.  For  conventional  programming  languages,  the  representation  language  includes 
expressions  for  booleans,  integers,  reals,  strings,  etc.  Typical  representation  languages  are 
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first-order  in  that  there  are  no  special  primitives  to  encode  higher-order ,  or  argument  taking 
values,  such  as  functions,  procedures,  and  predicates.  This,  in  turn,  means  that  in  order  to 
program  over  higher-order  domains  (that  is,  formulate  tasks  that  manipulate  higher-order 
objects),  the  client  must  himself  come  up  with  an  encoding,  and  then  explicitly  program 
fundamental  higher-order  operations  {e.g.,  substitution,  matching,  the  occurs  check1).  To 
some  extent,  this  situation  is  analogous  to  a  programmer  having  to  explicitly  code  basic 
arithmetic  or  string  operations. 

Of  course,  for  the  vast  majority  of  programming,  the  data  being  represented  is  first-order, 
and  thus  higher-order  representation  languages  are  irrelevant.  However,  for  programming 
over  higher-order  domains  —  for  example,  mathematics  (where  the  objects  to  be  manip¬ 
ulated  include  functions  and  predicates),  and  programming  itself  (in  which  programs  are 
manipulated)  —  higher-order  expressivity  is  of  substantial  importance.  Witness  the  success 
of  higher-order  programming  languages  such  as  ML,  LISP,  Scheme,  and  AProlog. 

Explanation-based  generalization.  We  have  mentioned  that  higher-order  languages  are 
particularly  suited  to  such  tasks  as  formal  program  development  and  theorem  proving.  The 
tools  which  perform  or  assist  such  tasks,  however,  are  at  present  inadequate  in  both  the 
amount  of  ‘knowledge’  they  contain  (i.e.,  the  level  of  support  they  provide)  and  in  their 
ability  to  ‘learn’  {i.e.,  their  capacity  to  enhance  that  support  over  time). 

The  application  of  a  relevant  machine  learning  technique,  explanation-based  generalization, 
has  thus  far  been  linrted  to  generalizing  first-order  representation  languages.  By  extending 
this  technique  to  higher-order  EBG  —  explanation-based  generalization  in  which  the  candi¬ 
dates  for  generalization  include  higher-order  objects,  we  facilitate  EBG’s  application  to  such 
naturally  higher-order  domains  as  program  development  and  theorem  proving. 

EBG  establishes  the  weakest  preconditions  sufficient  to  apply  a  particular  problem  solving 
strategy  in  general,  thereby  speeding  the  subsequent  solution  of  analogous  problems.  Re¬ 
cently,  the  logic  programming  paradigm  has  been  touted  as  a  foundation  for  EBG,  because  of 
its  declarative  nature,  due  to  its  support  for  unification  and  search,  and  because  it  admits  a 
common  representation  for  all  aspects  of  EBG.  However,  for  the  domains  in  which  we  are  in¬ 
terested,  the  first-order  representation  language  of  Prolog  is  inadequate.  Instead,  we  provide 
a  formulation  of  higher-order  EBG  over  X  Prolog  —  one  of  the  fundamental  contributions  of 
this  dissertation. 

‘Logical’  and  programmable  mechanisms  for  controlling  generalization  and  as¬ 
similation.  In  order  to  utilize  these  generalizations,  there  must  be  a  mechanism  for  ex¬ 
tending  the  existing  logic  program  V  with  new  clauses.  The  problem  with  the  standard 
means  by  which  Prolog  programs  are  extended  —  assert  —  is  that  the  construct  is  not 
semantically  well-behaved.  As  a  result,  programs  making  use  of  assert  are  harder  to  reason 
about  and  manipulate  {e.g.,  compile).  For  this  and  other  reasons,  assert  is  not  part  of 


*The  occurs  check,  as  familiar  from  logic  programming,  determines  whether  a  variable  occurs  free  within 
an  expression  [124,  pp.69-70]. 
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AProlog.  We  herein  propose  a  new  construct,  rule,  which  has  some  of  the  power  of  as¬ 
sert,  but  also  offers  a  straightforward  semantics  reconcilable  with  AProlog.  Operationally, 
rule  provides  for  the  extension  of  the  logic  program  with  results  that  deductively  follow 
from  that  program.  Later  we  devise  an  analogous  construct,  rule.ebg,  that  constructs  and 
assimilates  explanation-based  generalizations.  Herein  lies  a  second  thesis  contribution:  se¬ 
mantically  straightforward  constructs  for  controlling  generalization  and  assimilation  within 
the  logic  programming  framework. 

Aj?Prolog.  The  enhancement  of  AProlog  with  higher-order  EBG  and  with  a  means  for 
coordinating  learning  yields  A° Prolog.  The  early  chapters  of  this  dissertation  set  the  stage 
for  this  language  by  developing  each  of  the  preceding  topics,  relying  heavily  throughout 
upon  examples.  (In  fact,  our  examples  where  produced  via  a  prototype  implementation 
of  A°Prolog.)  The  third  thesis  contribution  is  this  language  itself,  which  we  claim  is  an 
attractive  vehicle  for  the  formulation  of  tasks  that  benefit  from  its  higher-order  representa¬ 
tion  logic  and  programmable  generalization  and  learning.  Moreover,  A^Prolog  provides  this 
functionality  in  such  a  way  that  user-interaction  (for  the  addressing  of  problems  beyond  the 
capabilities  of  the  logic  program  itself)  can  be  smoothly  integrated. 

1.1  Thesis  Contributions 

In  somewhat  more  concrete  terms,  this  thesis  contributes  the  following: 

•  The  rule  construct,  which  provides  a  semantically  sound  means  for  universal  gener¬ 
alization  (i.e.,  the  selective  universal  quantification  of  free  variables)  within  the  logic 
programming  paradigm,  and  in  particular  AProlog. 

•  A  formal  account  of  rule,  and  its  variant  lemma. 

•  An  alternative  formulation  of  the  EBG  paradigm  relying  upon  modal  logic  to  formalize 
the  heretofore  tacit  distinction  between  domain  and  training  theory. 

•  The  extension  of  the  EBG  algorithm  to  treat  higher-order  representation  language. 

•  A  formal  account  of  higher-order  EBG  (in  terms  of  a  AProlog  meta-interpreter). 

•  The  realization  of  EBG  as  a  programmable  feature  of  A£ Prolog  through  a  generaliza¬ 
tion  of  the  rule  construct  —  rule.ebg. 

•  A  formal  account  of  rule.ebg,  and  its  variant  lemma_ebg. 

•  The  integration  of  all  of  the  above  within  A  £  Prolog  in  a  manner  that  admits  user-guided 
problem  solving  and  generalization. 
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•  Numerous  examples. 
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8 


Chapter  2 
Motivation 


This  chapter  attempts  to  establish  an  appropriate  framework  for  the  ideas  developed  within 
this  dissertation.  To  that  end,  it  is  necessary  to  discuss  the  application  of  formal  meth¬ 
ods  to  program  development  and  theorem  proving  —  a  topic  much  more  thoroughly  and 
eloquently  treated  by  others:  see,  for  example,  Dijkstra  [34,  36],  Sintzoff  [119],  Gries  [51], 
Broy  [11],  Bauer  [5],  Scherlis  &  Scott  [116],  Balzer,  Cheatham  &:  Green  [2],  Meyer  [82], 
Constable,  et  al.  [20],  Barwise  [3],  and  more  recently,  Gerhart  [48]  and  Wing  [135].  Readers 
primarily  interested  in  our  extensions  of  the  explanation-based  learning  paradigm  or  in  our 
enhancements  to  the  logic  programming  language  AProlog,  may  find  this  chapter  largely 
superfluous. 

The  results  of  the  programming  process,  namely  programs,  are  necessarily  expressed  for¬ 
mally. ,  that  is,  within  formal  language.  Similarly,  an  increasing  amount  of  mathematics  is 
carried  out  within  formal  language.  A  formal  language  is  one  that  is  mathematically  precise 
—  that  is,  it  has  a  well-defined  syntax  {e.g.,  through  a  BNF  grammar)  and  a  well-defined 
semantics  (e.^.,  through  a  mathematical  model).  Formalization  is,  then,  the  process  of  cod¬ 
ifying  ideas  expressed  informally  (e.g.,  in  a  natural  language)  within  a  formal  language.  In 
general,  formalization  requires  resolving  ambiguity,  thereby  achieving  the  precise  expression 
(and  hence  communication)  of  concepts.  The  classification  ‘formal  method’  is  a  term  applied 
to  paradigms  that  more  strongly  rely  upon  formal  language. 

While  there  exist  a  wealth  of  tools  to  assist  the  tasks  of  program  development  and  theorem 
proving,  the  level  of  support  they  provide  is  generally  inadequate.  If  we  are  to  build  tools 
that  offer  a  substantially  higher  level  of  functionality,  it  is  essential  that  we  continue  to  place 
a  greater  reliance  upon  formal  methodologies.  This  is  due  to  the  fact  that  formal  techniques 
facilitate  a  wider  and  deeper  penetration  of  machine  support,  simply  because  they  require 
that  more  of  the  relevant  ‘knowledge’  be  encoded  in  a  formal  (i.e.,  a  machine  manipulable) 
language. 

Of  course,  successfully  coding  programming  or  theorem  proving  ‘knowledge’  a  priori  is  im¬ 
possible  due  to  the  scope,  complexity,  and  evolutionary  nature  of  these  domains.  Rather, 
tools  must  support  the  assimilation  of  experience  gained  in  the  course  of  solving  problems. 
However,  simply  memoizing  (t'.e.,  caching)  particular  solutions  will  be  insufficient;  instead, 
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Informal  Requirements 
(Natural  Language) 


Programming 

*■  Formal  Implementation 
„ _  (Programming  Language) 

Validation 


Figure  2.1:  Initial  Approach 


experience  must  be  abstracted  or  generalized.  Learning,  the  ability  to  generalize  and  assim¬ 
ilate  from  experience,  will  therefore  have  a  significant  impact  on  the  success  of  future  tools 
and  methods. 

The  vehicle  we  have  chosen  for  experimenting  with  generalization  and  learning  over  the 
domains  of  program  development  and  theorem  proving  is  the  higher-order  logic  programming 
language  AProlog.  These  domains  demand  a  higher-order  treatment  in  that  they  require  the 
manipulation  of  higher-order  objects,  such  as  functions  and  predicates.  AProlog  provides 
the  simply-typed  A-calculus  for  the  representation  of  higher-order  objects,  and  furthermore 
supports  higher-order  programming  —  that  is,  the  ability  to  create  goals  and  programs  and 
pass  them  as  arguments.  The  latter  concern  is  particularly  relevant  for  the  realization  of 
programming  tools,  since  it  affords  the  ability  to  manipulate  the  logic  program  itself. 


2.1  New  Computing  Tools  for  Programming  and  Math 
ematics 

2.1.1  Tools  for  Programming 

An  initial  view  of  the  programming  process  is  depicted  in  Figure  2.1:  given  a  set  of  informal 
requirements,  programming  is  the  task  of  constructing  a  (formal)  program  to  meet  those 
needs.  This  paradigm  is,  however,  problematic.  Consider  that  programmers  are  often  given 
the  task  of  determining  what  it  is  that  existing  code  does.  Can  you  answer  that  question 
for  the  program  of  Figure  2.2,  which  is  taken  from  Bentley’s  Writing  Efficient  Programs  [6, 
p.60]?1  The  only  substantive  modification  I  have  made  is  to  replace  the  procedure’s  descrip¬ 
tive  name  with  /.  The  answer  is  given  below. 


Program  specification  and  program  abstraction.  One  difficulty  with  producing  un¬ 
derstandable  code  is  that  the  goals  of  clarity  and  efficiency  tend  to  be  mutually  exclusive. 
Now  consider  a  formal  specification  of  the  preceding  program,  given  in  Figure  2.4.  From 
this,  the  reader  presumably  has  little  trouble  recognizing  the  Fibonacci  function.  What  is 


'The  choice  of  example  is  borrowed  from  a  presentation  by  William  Scherlis. 


/(»)  <$=  var  a,b,i  :  integer 
begin 

if  n  <  0  then  return  0; 
if  n  <  2  then  return  1; 
a  <—  1; 
b  «-  1; 

for  t  «-  1  to  (n  div  2)  do  begin 
a  <—  a  +  b; 

6  ♦ —  6  +  a 
end ; 

if  odd(n)  then  b  <—  6  +  a; 
return  6 
end 


Figure  2.2:  A  puzzling  program 


it  that  distinguishes  specification  and  implementation?  Primarily,  the  specification  is  more 
abstract ,  in  that  much  of  the  detail  of  the  implementation  has  been  omitted,  while  the 
implementation  is  more  efficient  (in  this  case  because  it  does  not  recompute  values).  In 
general,  programs  are  made  efficient  by  making  commitments  to  data  representation,  order 
of  computation,  etc.,  and  then  optimizing  on  the  basis  of  those  commitments.  This  process 
of  specialization  necessarily  complicates  the  functionality  with  procedural  detail  capturing 
how  that  functionality  is  to  be  achieved  [115,  116]. 

Like  the  implementation,  the  Fibonacci  specification  is  formal  since  it  is  expressed  in  a  (po¬ 
tentially)  formal  language.  Languages  for  formal  specification  are  generally  characterized  as 
abstract,  very  high-level,  or  wide-spectrum,  and  often  are  nondeterministic  or  nonexecutable. 
The  term  ‘wide-spectrum’  is  indicative  of  the  same  language  serving  for  both  specification 
and  implementation  (although  either  might  be  restricted  to  a  particular  subset  of  that  lan¬ 
guage). 

The  construction  of  a  formal  specification  divides  the  task  of  programming  into  two  parts: 
the  transition  from  informal  requirements  to  formal  specification,  or  ‘what’,  and  the  transi¬ 
tion  from  specification  to  efficient  implementation,  or  ‘how.’  In  the  ideal,  then,  specification 
languages  serve  to  express  the  program’s  intended  functionality  unencumbered  by  details 
of  computation  strategy.  (Our  use  of  the  term  ‘program’  is  intended  to  encompass  the 
spectrum  from  specification  to  implementation.)  The  resulting  programming  methodology 
that  results  is  illustrated  in  Figure  2.3.  A  substantial  number  of  general  and  special  pur¬ 
pose  specification  languages  have  been  developed:  consider,  just  for  example,  Larch  [52,  53], 
CIP-L  [4],  GIST  [134],  Refine  [120],  Z  [122],  and  RAISE  [101].  (Meyer  provides  an  acces¬ 
sible  introduction  to  formal  specification  and  contrasts  the  approach  with  natural  language 
specification  [82],  while  Wing  gives  an  overview  [135].) 

Our  discussion  of  high-level  languages  has  not  touched  upon  the  range  of  useful  abstrac- 
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Figure  2.3:  Program  Specification  Approach 


m  =  i 

/(i)  =  i 

/(n)  =  /(n  -  1)  +  /(n  -  2) 


Figure  2.4:  A  less  puzzling  program 


tion  techniques  in  programming  languages:  Consider,  again  just  for  example,  abstract  data 
types,  polymorphism,  program  libraries,  object-oriented  techniques,  Dershowitz’s  program 
templates  [28],  idioms  in  the  Programmer’s  Apprentice  [111,  133],  structuring  techniques 
such  as  module  and  interface  definitions,  and  finally  those  abstractions  associated  with 
computer-aided  software  engineering  (CASE)  management  tools  [18].  But  while  abstractions 
of  programming  language  go  a  long  way  toward  easing  the  task  of  program  development,  we 
claim  that  they  are  inherently  limited. 

Program  rationales.  Consider  again  the  task  of  understanding  the  Fibonacci  implemen¬ 
tation  of  Figure  2.2.  Given  that  the  formal  specification  captures  what  the  implementation 
is  doing,  the  programmer  is  still  left  to  determine  how  it  is  accomplished.  Reverse  engineer¬ 
ing  refers  to  this  problem  of  a  posteriori  reconstructing  the  rationale  for  an  implementation, 
a  nontrivial  task  for  even  our  simple  example.  An  informal  rationale  linking  the  Fibonacci 
specification  and  implementation  is  given  in  Figure  2.5.  We  claim  that  it  is  the  combination 
of  specification  and  rationale  that  best  elucidate  the  implementation. 

What  should  be  the  real  results  of  the  program  design  process?  If  program  executions 
were  the  sole  results  of  interest,  then  it  would  be  sufficient  that  the  result  of  programming 
be  simply  a  program.  But  this  is  rarely  the  case  in  practice.  Most  programs  undergo 
analysis,  modification,  and  adaptation,  usually  during  their  original  development.  Under 
these  circumstances,  delivering  the  program  itself  is  simply  not  sufficient.  An  indication  is  the 
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1.  Compute  successive  pairs  of  Fibonacci  numbers  with  a  tail  recursive  program: 

(u,v)  <-  («,«  +  «) 

2.  Replace  tail- recursion  with  iteration. 

3.  Unroll  the  loop  once.  This  allows  the  removal  of  a  trivial  assignment  and  a  temporary 
variable. 

Figure  2.5:  Fibonacci  Rationale 


experience  of  software  maintainers,  who  spend  the  majority  of  their  time  reverse  engineering 
existing  code.  (Lientz  &  Swanson’s  survey  claims  that  maintenance  represents  approximately 
70%  of  software  cost  [78].)  The  problem  is  that  more  knowledge  has  been  brought  to  bear 
on  the  implementation  of  a  program  than  is  evident  in  the  code  alone.  This  knowledge  may 
be  presented  in  the  form  of  a  rationale  or  design  history  of  the  program  [116,  33].  Within 
a  design-based  paradigm,  the  designed  objects  should  not  be  programs,  but  rather  program 
rationales. 

Program  modification  is  ordinarily  very  difficult  for  programmers  because,  like  a  posteriori 
verification,  it  requires  rediscovery  of  concepts  used  during  the  development  of  the  imple¬ 
mentation.  But  by  preserving  the  rationale  of  the  initial  program,  it  is  often  possible  to 
pinpoint  the  design  decision  that  must  be  altered,  and  carry  over  (i.e.,  replay  [97])  much  of 
the  remaining  structure  of  the  original  development. 

Figure  2.6  illustrates  this  paradigm.  The  new  formal  object  —  the  design  record  —  serves 
as  a  ‘road  map’  from  specification  to  implementation;  that  is,  it  captures  the  sequence  of 
design  decisions  (represented  formally)  from  which  the  implementation  can  be  derived  from 
the  specification.  The  rationale  may  be  considered  a  meta-program  in  that  it  is  a  program 
that  manipulates  other  programs. 

Successive  refinement  paradigms  are  a  step  toward  design-based  development  in  that  the 
informal  programming  process  is  replaced  with  a  series  of  programs,  beginning  with  the 
specification  and  leading  to  the  implementation,  each  of  which  is  generally  of  better  per¬ 
formance  than  its  predecessor.  Consider,  for  example,  capturing  the  evolution  of  Fibonacci 
with  a  series  of  programs,  each  annotated  with  the  appropriate  comment  from  the  rationale 
of  Figure  2.5.  These  successive  programs  might  be  expressed  within  a  single  wide-spectrum 
language  or  within  several  layers  of  increasingly  concrete  languages.  The  successive  refine¬ 
ment  model  divides  program  development  into  more  intelligible  and  more  easily  justified 
steps,  in  that  way  affording  greater  confidence  in  the  resulting  implementation. 

There  are,  however,  several  reasons  to  encode  program  rationales  formally  rather  than  infor¬ 
mally.  First,  by  making  design  records  formal  objects,  we  increase  the  likelihood  that  they 
will  actually  be  written  and  maintained.  Consider  the  cynical  yet  all  too  relevant  words  of 
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Figure  2.6:  Program  Rationale  Approach 
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Dave  Storer  [7,  p.60]:  “Don’t  get  suckered  in  by  the  comments  —  they  can  be  terribly  mis¬ 
leading.  Debug  only  the  code.”  More  significantly,  a  formal  design  history  is  by  definition 
precise  and,  further,  allows  us  to  make  claims  about  the  relationship  between  specification 
and  implementation.  For  example,  if  each  of  the  design  steps  is  truth  preserving ,  the  re¬ 
sulting  implementation  is  correct  by  construction.  Finally,  explicit  formal  rationales  may 
themselves  be  manipulated  by  computing  tools:  in  particular,  they  may  be  abstracted  (to 
produce  programming  strategy)  and  reused  on  related  problems. 

The  design-based  paradigm  of  programming  is  by  no  means  a  new  idea:  it  has  been  espoused 
by  many  researchers  through  a  series  of  methodologies  over  several  years.  These  approaches 
range  from  unstructured  successive  prototyping  schemes  to  design  frameworks  in  which  an 
account  (a  rationale)  is  associated  with  the  individual  steps  of  program  development.  Within 
these  structured  design-based  paradigms,  rationales  may  themselves  be  formal  (rules,  proof 
steps)  or  informal  (comments).  The  following  enumerates  work  on  the  formal  end  of  this 
spectrum: 

•  Program  transformation  ( e.g .,  [13,  42,  68, 115, 121])  —  This  approach  is  best  visualized 
as  a  state/operator  space  in  which  the  states  are  programs  and  the  operations  are 
transformations  mapping  between  programs.  Typically,  program  transformations  are 
largely  syntactic  manipulations,  which  improve  efficiency  when  sequenced  effectively. 
They  may  or  may  not  preserve  the  meaning  of  the  program.  Transformations  have 
been  applied  in  a  number  of  different  ways: 

—  automatically,  often  within  the  compilers  of  high-level  languages.  Here  no  ex¬ 
plicit  design  record  is  built;  rather  it  is  implicit  in  the  transformation  engine  or 
expressed  by  directives  given  to  the  compiler  (typically  as  annotations  within  the 
programming  language). 

—  via  an  explicit  meta-program  —  a  sequence  of  transformations  specified  a  priori. 

—  interactively. 

The  term  ‘program  transformation’  is  very  broad,  encompassing  any  well-defined  op¬ 
eration  on  a  program.  (See  Chapter  7  for  examples  and  further  discussion.) 

•  Interactive  constructive  programming  or  proofs  as  programs  ( e.g .,  NuPRL  [20],  Calcu¬ 
lus  of  Constructions  [21])  —  These  are  paradigms  in  which  the  programmer  proves  a 
theorem  (or  specification)  constructively ,  thereby  implicitly  defining  an  algorithm  for 
producing  the  answer.  The  program  is  then  extracted  from  the  proof. 

•  Proof  transformation  [104,  80]  —  This  is  a  hybrid  of  the  transformation  and  proof- 
based  approaches  in  which  proofs  rather  than  programs  are  transformed.  This  combi¬ 
nation  moves  correctness  considerations  outside  of  the  transformation  semantics,  since 
the  resulting  proofs  may  be  checked  for  validity  (presumably  by  machine). 
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2.1.2  Tools  for  Mathematics 


Unlike  programs,  mathematical  proofs  are  not  inherently  formal:  their  role  has  simply  been 
to  convince  other  mathematicians  of  a  theorems  validity  [24,  3].  And  in  terms  of  human 
consumption,  informal  arguments  Me  generally  superior  to  formal  ones.  Nevertheless,  in 
order  to  enhance  the  application  of  computing  tools  to  theorem  proving  tasks,  proofs  should 
be  represented  formally.  The  same  arguments  applied  within  the  discussion  of  program 
development  axe  relevant  here:  (1)  formal  proofs  Me  precise,  (2)  they  may  be  manipulated, 
and  in  particular,  checked  and/or  reapplied  by  a  machine,  and  (3)  they  may  be  generalized 
(t.e.,  abstracted)  by  machine  to  represent  theorem  proving  strategy,  and  thereby  utilized  in 
new  situations. 

The  above  discussion  suggests  an  analogy  between  theorems  (mathematical  formulas)  and 
specifications  (programs),  and  also  between  proofs  and  program  rationales  [116].  We  shall 
use  the  term  derivation  to  encompass  the  formal  reasoning  supporting  either  a  theorem  or  a 
program  implementation,  and  the  discussion  to  follow  pertains  to  both  domains.  In  fact,  for 
the  ‘proofs  as  programs’  pMadigm,  the  process  of  programming  is  reduced  to  one  of  theorem 
proving. 


2.1.3  Formalization 

Formal  methods  have  been  maligned  in  the  literature:  see  in  particulM,  De  Millo,  Lipton 
&  Perlis  [24]  and  Fetzer  [45].  The  essential  issue  is  the  open  philosophical  question  of  the 
extent  to  which  mathematical  and  programming  knowledge  can  be  formalized.  To  some 
degree,  the  deeper  relevance  of  the  thesis  rests  upon  the  formalist’s  viewpoint:  that  any  line 
of  reasoning  may  be  formalized.  This  discussion  is,  however,  beyond  the  scope  of  this  thesis. 
Interested  readers  should  pursue  the  above  references,  as  well  as  BMwise’s  reply  [3]  and  that 
offered  by  Scherlis  and  Scott  [116].  The  debate  has  continued  most  recently  with  an  article 
by  Dijkstra  on  computer  science  education  and  the  responses  it  elicited  [35]. 

A  more  limited  criticism  of  formal  methods,  espoused  for  example  by  some  softwMe  engi¬ 
neers,  is  that  while  formal  techniques  work  well  for  mathematically  elegant  problems  ( e.g ., 
Fibonacci),  they  are  ineffective  at  addressing  the  range  and  scale  of  problems  faced  by  pro¬ 
grammers.  Our  response  is  fourfold:  First,  we  need  more  expressive  and  special  purpose 
specification  languages  that  can  formally,  concisely,  and  elegantly  capture  the  functionality 
of  a  greater  variety  of  systems.  Second,  we  need  more  expressive  meta-languages  to  capture 
programming  design  decisions,  again  formally,  concisely,  and  elegantly.  Third,  the  tools  sup¬ 
porting  formal  methodologies  must  provide  substantially  greater  levels  of  assistance  to  the 
user.  Of  course,  the  goals  of  better  languages  and  better  tools  are  not  mutually  independent: 
progress  on  tools  is  limited  by  the  elegance  and  expressivity  of  the  language,  and  language 
improvements  will  come  out  of  better  tools. 

And  finally,  it  must  be  recognized  that  formal  methods  are  one  component  of  an  overall  soft¬ 
ware  engineering  strategy:  In  suggesting  the  design-based  view,  we  are  not  advocating  the 
replacement  of  existing  methodologies  and  tools  with  some  new  all-encompassing  paradigm. 
Instead,  we  believe  that  for  certain  tasks  —  for  example,  highly  optimized  or  mathematical 
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algorithms,  or  procedures  requiring  a  high  degree  of  confidence  —  the  design-based  ap¬ 
proach  will  become  feasible.  The  way  to  proceed,  then,  is  to  make  design-based  paradigms 
more  broadly  applicable,  to  make  them  easier  to  use,  and  to  integrate  them  within  existing 
methodologies  and  tools. 

2.2  The  Relevance  of  Logic  Programming  and  AProlog 

Generally,  search  and  unification  (matching)  are  integral  components  of  theorem  provers  and 
formal  programming  tools.  Logic  programming  languages  are  particularly  suited  to  these 
domains  because  of  their  implicit  support  for  both  search  and  unification.  This  allows  the 
programmer  to  focus  at  a  high-level,  on  the  ‘logic’  if  you  will,  with  less  regard  for  underlying 
implementation  details.  The  result  is  a  more  clear  and  concise  formulation  of  many  problem 
domains. 

At  the  same  time,  the  tasks  of  program  development  and  theorem  proving  are  fundamentally 
higher-order ,  since  they  rely  upon  the  manipulation  of  higher-order  objects  —  objects  that 
take  other  objects  as  arguments.  Higher-order  objects  are  most  naturally  represented  with  a 
name  binding  operator  such  as  A.  AProlog  provides  for  the  elegant  representation  of  higher- 
order  objects  in  that  it  contains  the  typed  A-calculus  as  a  datatype.  And  furthermore, 
AProlog  affords  the  manipulation  of  these  higher-order  encodings  by  providing  the  A-specific 
operations  of  <* /^-conversion  as  well  as  higher-order  unification. 

AProlog  is  also  an  attractive  meta-language  for  the  expression  of  formal  rationales  and  proofs. 
This  is  because  of  the  above,  and  because  AProlog  offers  the  further  expressivity  of  nested 
implication  and  explicit  quantification.  The  language  also  provides  a  degree  of  polymor¬ 
phism,  which  allows  logic  programs  (and  thus  derivations)  to  be  abstracted  over  a  particular 
datatype. 

In  addition  to  higher-order  objects,  AProlog  supports  higher-order  programming ,  in  that  one 
may  create  goals  and  programs  and  pass  them  as  arguments.  This  allows  us  to  naturally 
reason  at  the  meta-level,  that  is  about  the  logic  program  itself.  The  expressiveness  of  higher- 
order  programming  becomes  important  with  multiple  meta-layers  of  language,  because  it 
elegantly  facilitates  reflection  —  the  mapping  from  data  to  executable  logic  program,  and 
reification  —  the  reverse  mapping  (§5.3.1).  Indeed,  since  we  are  interested  in  program 
development  paradigms  and  theorem  provers,  we  further  use  AProlog  (albeit  in  a  limited 
way)  as  a  meta-meta-language ,  for  expressing  the  manipulation  of  derivations  (i.e.,  rationales 
and  proofs). 

In  summary,  AProlog  is  an  attractive  framework  in  which  to  experiment  with  tools  for 
theorem  proving  and  program  development  because  it  is  both  a  logic  programming  language 
and  a  higher-order  programming  language.  We  expand  upon  this  discussion  within  the  next 
chapter.  A  more  complete  argument  is  made  within  Felty  &  Miller  [43]  and  Hannan  &: 
Miller  [59]. 
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2.3  The  Need  for  Generalization  and  Learning 


To  generalize  an  expression  is  to  make  that  expression  less  specific  —  that  is,  to  make  it, 
in  some  sense,  more  broadly  applicable.  To  abstract  an  expression,  on  the  other  hand,  is  to 
make  it  less  detailed  —  that  is,  to  abbreviate  it  in  some  manner.  Typically,  abstraction  and 
generalization  coincide:  operations  that  make  an  expression  more  broadly  applicable  most 
often  remove  detail:  for  example,  replacing  a  constant  in  rained  tuesday  with  a  variable 
in  rained  x }  For  the  discussion  to  follow,  we  use  the  terms  generalization  and  abstraction 
interchangeably. 

In  the  preceding  sections,  we  suggested  formal  representation  of  derivations  as  a  means  by 
which  to  capture  problem  solving  experience  or  design  knowledge.  An  overriding  limitation 
of  formal  derivations  is  that  they  are  often  so  verbose  as  to  be  unintelligible.  Indeed,  the 
attraction  of  informal  rationales  is  that  they  omit  less  pertinent  details.  Generalization 
is  thus  a  potential  avenue  for  making  formal  methods  more  palatable  to  the  user.  For 
example,  it  is  our  belief  that  by  abstracting  sequences  of  low-level  derivation  steps,  we 
can  produce  more  lucid,  high-level  derivations  (much  as  high-level  programming  languages 
abstract  sequences  of  machine  instructions). 

Effective  tools  for  supporting  formal  methods  will  contain  a  substantial  store  of  information, 
such  as  general  programming  techniques,  previously  solved  problems  (e.g.,  objects  and  their 
associated  methods),  problem  domain  theories,  or  derived  mathematical  results.  As  men¬ 
tioned  earlier,  successfully  coding  this  ‘knowledge’  a  priori  is  impossible  due  to  its  scope, 
complexity,  and  evolutionary  nature.  Rather,  tools  must  support  the  assimilation  of  experi¬ 
ence  gained  in  the  course  of  solving  problems.  Our  previous  claim  —  that  formal  tools  must 
support  learning  —  simply  means  that  they  must  facilitate  the  growth  of  this  knowledge 
base,  and  that  such  growth  must  often  include  abstracted  or  generalized  experience.  The 
ultimate  goal  for  generalization  and  learning  within  a  designed-based  paradigm  is  to  enable 
the  construction  of  libraries  of  derivations  and  abstracted  derivations  analogous  to  (and  used 
in  combination  with)  the  program  libraries  of  today. 

We  do  not  suggest  that  the  technique  of  higher-order  explanation-based  generalization  and 
learning  developed  herein  is  sufficient  for  automating  the  spectrum  of  generalization  tech¬ 
niques  necessary  to  realize  this  vision.  Rather,  we  believe  that  EBL  is  an  attractive  technique 
to  explore,  particularly  because  validity  is  not  sacrificed  in  the  generalization  process:  that 
is,  the  results  of  explanation- based  generalization  are  guaranteed  to  be  sound  (i.e.,  follow 
from  the  existing  theory). 


20ne  notable  exception  to  this  is  disjunction:  while  rained  tuesday  V  rained  Wednesday  is  more 
‘general’  (i.e.,  less  specific)  than  rained  tuesday,  it  is  not  more  ‘abstract’  (i.e.,  less  detailed). 


Chapter  3 

AProlog  —  A  Higher- Order  Logic 
Programming  Language 


Nadathur  &  Miller  introduce  the  higher-order  logic  programming  language  AProlog  [100]. 
AProlog  extends  traditional  logic  programming  languages 

•  by  providing  the  simply-typed  A-calculus  as  a  data-type, 

•  by  incorporating  the  higher-order  unification  required  for  A-terms, 

•  by  including  more  expressive  logic  constructs:  embedded  implication  and  explicit 
quantification, 

•  by  admitting  higher-order  predicates  in  a  principled  manner, 

•  by  providing  a  degree  of  polymorphism,  and 

•  by  supporting  abstraction  mechanisms  such  as  modules  and  higher-order  data¬ 
types. 

Within  this  chapter  we  briefly  introduce  AProlog.  While  this  work  relies  upon  and  extends 
AProlog,  the  language  is  itself  a  research  prototype.  (This  chapter  presumes  some  familiarity 
with  Prolog  and  the  typed  A-calculi;  respective  introductions  are  Sterling  &  Shapiro  [124] 
and  Hindley  &  Seldin  [63].) 


3.1  The  Logic  Programming  Framework 

In  general,  logic  programming  languages  offer  several  features  relevant  to  formal  program 
development  and  theorem  proving: 

•  The  underlying  support  for  unification  facilitates  the  implementation  of  rewrite  and 
inference  rules  (e.g.,  program  transformations  and  theorem  proving  tactics).  Typically, 
the  application  of  such  rules  relies  upon  unification  or  matching  —  unification  in  which 
only  one  term’s  variables  may  be  instantiated.  Moreover,  logic  programming’s  support 
for  unification  is  unobtrusive,  allowing  rules  to  be  elegantly  expressed. 

•  The  logic  programming  clause  was  designed  to  encode  inference  rules:  the  head  of 
the  clause  specifies  the  conclusion  of  a  rule,  while  the  body  contains  its  premises. 
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Rewrite  rules  may  also  be  elegantly  expressed  as  clauses:  the  head  typically  consists 
of  a  predicate  relating  the  rewrite’s  input  and  output,  and  the  body  specifies  any 
necessary  preconditions. 

•  The  implicit  search  paradigm  of  depth-first  backchaining  is  often  sufficient  for  applying 
inference  rules.  And  even  for  cases  in  which  it  is  inadequate,  the  default  search  strategy 
can  still  aid  specification,  prototyping,  and  testing. 

•  The  default  search  mechanism  may  be  elegantly  augmented  with  programmer- defined 
control,  including  guidance  through  user-interaction. 

However,  first-order  logic  programming  languages,  such  as  Prolog,  suffer  from  other  restric¬ 
tions  that  make  them  less  suitable  for  higher-order  problem  domains. 


3.2  The  Simply-typed  A-Calculus 

AProlog  replaces  Prolog’s  first-order  terms  (i.e.,  Herbrand  terms1)  with  terms  of  the  simply- 
typed  A-calculus.  The  A-calculi  are  a  family  of  languages  introduced  to  study  higher-order 
programming.  Often  the  only  data-type  the  pure  forms  of  these  languages  contain  is  that 
of  A-terms  —  functions  constructed  with  the  binding  operator  A.  A-calculi  are  nevertheless 
rich  languages  with  which  to  experiment,  in  part  because  more  complex  data-structures  may 
be  encoded  as  functions  [63]. 

Simple  types.  Before  discussing  A-terms,  we  introduce  the  type  system  over  A-terms. 
Simple  types  may  be  inductively  defined  as 

r  ::=  a  |  A  \  tx  — ►  r2  |  a  ra ...  rn 

where  r  ranges  over  simple  types,  a  over  type  constants,  and  A  over  type  variables.  General 
notation:  We  use  boldface  for  constants  (as  well  as  for  meta- variables  such  as  a,  which  range 
over  constants),  and  italics  for  variables  (as  well  as  for  meta- variables  such  as  r).  Function 
types  are  constructed  with  — ►,  which  associates  to  the  right:  A  — ►  B  — ►  C  is  read  as 
A  — *  (B  — ►  C).  Type  constructors  consist  of  a  type  constant  followed  by  some  number  of 
argument  types  (e.g.,  list  int). 

Two  predefined  type  constants  of  particular  interest  are  int  —  the  type  of  integers,  and  o 
—  the  type  of  AProlog  propositions  (goals  and  clauses).  New  type  constants  are  defined  by 
explicit  declaration: 

kind  bool  type. 

kind  list  type  — >  type. 


1  Herbrand  terms  (i.e.,  those  within  the  Herbrand  universe)  may  be  defined  as 
Af  ::=  c  |  f  Mi  ...  M„ 

where  c  ranges  over  first-order  constants  and  f  over  function  constants  [124]. 
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The  above  use  of  — ►  produces  a  kind  rather  than  a  type:  types  insure  that  A-terms  are 
well-formed,  while  kinds  insure  that  the  simple  types  themselves  are  well-formed  [63]. 

A-terms.  Simply-typed  A-terms  may  then  be  inductively  defined  as 
M  ::=  c  |  x  :  r  |  MN  |  A x  :  t.  M 

where  M  and  N  range  over  terms,  c  ranges  over  constants,  x  over  variables,  and  r  ranges  over 
simple  types.  The  types  of  constant  terms  are  separately  declared  within  a  type  signature: 

type  succ  int  — *  int. 

A  given  A-abstraction  A x\t.M  is  of  function  type  r  — ►  r'  provided  M  has  type  r'.  The 
juxtaposition  MN  denotes  a  A-term  application,  which  is  of  type  r'  provided  M  is  of  type 
r  — ►  r'  and  N  is  of  type  r.  A-term  application  associates  to  the  left:  abc  is  read  as  (ab)c. 
Thus  the  Prolog  term  p(a,b)  is  written  as  p  a  b  in  AProlog. 

Type  reconstruction.  A-terms  become  exceedingly  redundant  if  all  of  the  types  required 
by  the  syntactic  definition  are  explicitly  included.  A  more  succinct  representation  is  afforded 
by  eliding  unnecessary  type  information.  Type  reconstruction  is  the  process  of  rederiving 
those  omitted  types.  In  practice,  all  types  are  omitted  from  AProlog  terms.  Instead,  the 
types  of  variables,  untyped  constants,  abstractions,  and  applications  are  determined  from 
context.  In  the  sequel,  we  will  omit  types  with  the  understanding  that  they  are  to  be  derived 
through  type  reconstruction. 

Type  reconstruction  may  fail  for  A-terms  that  cannot  be  well-typed  —  i.e.,  typed  subject  to 
the  preceding  rules.  Similarly,  type  reconstruction  may  assign  a  lax  or  nonrestrictive  type 
to  A-terms  for  which  insufficient  type  information  has  been  provided.  Error  and  warning 
messages,  respectively,  are  typically  provided  by  AProlog  implementations. 

Polymorphism.  The  inclusion  of  type  variables  within  simply-types  offers  a  simple  form 
of  polymorphism.  For  example,  the  polymorphic  identity  function  id  =  Ax :  A.  x  and  double 
function  db  =  A f:A  — ►  A.  Ax:  A.  f(fx)  are  given  the  types 

type  id  A  — *  A. 

type  db  (A  — *•  A)  — ♦  A  — *  A. 

Particular  occurrences  of  id  or  db  may  then  be  ‘instantiated’  (through  the  binding  of  A)  to 
operate  on  distinct  types,  such  as  within  id  1  and  id  db.  The  polymorphism  of  the  simply- 
typed  A-calculus  is  also  termed  ‘ML-style’  or  ‘Milner-style’  after  the  polymorphism  of  the 
language  ML  [90]. 
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Basic  operations  on  A-terms.  We  use  the  notation  [ N/x]M  to  denote  the  substitution 
of  N  for  free  occurrences  of  x  in  M.  (Bound  variables  may  have  to  be  renamed  to  avoid 
capture;  see  below.)  The  term  operations  supported  by  AProlog  include  /?-  and  77-reduction 
as  well  as  o-con  version,  which  are  defined  as  follows: 

(A x.M)  N  =$>0  [N/x]M  e.g.,  (Xx.Xy.fxy)y  =>p  Xy'.fyy' 

Xx.Mx  =$»,,  M  provided  x  not  free  in  M  e.g.,  Ax.fx  =>v  f 

A  x.M  =>a  X  y.[y/x]M  provided  y  not  free  in  M  e.g.,  Xx.x  =>Q  A  y.y 

Closures  over  these  operations  yield  corresponding  notions  of  A-term  convertibility:  M  is 
said  to  be  /^-convertible  to  M '  if  there  exists  a  sequence  of  /?-reductions  and  /^-expansions 
(the  inverse),  applied  at  the  top-level  or  to  subterms,  transforming  A.  into  M1 .  We  may 
similarly  define  77-  and  o-convertibility,  as  well  as  combinations  thereof.  We  use  =0/3„,  or 
simply  =Pv,  to  denote  the  equivalence  relation  of  a /^-convertible  A-terms. 

In  this  calculus,  ^77-reductions  are  normalizing  and  Church-Rosser  [63]  —  that  is,  maximal 
sequences  of  such  reductions  applied  to  a  given  well-typed  A-term  terminate  with  a  unique 
A-term  said  to  be  in  /^-normal  form.  This  property  is  a  consequence  of  the  typing  given  to 
A-terms,  and  is  crucial  for  the  unification  algorithm,  since  the  convertibility  of  two  terms  can 
be  tested  by  comparing  their  normal  forms  for  equivalence  modulo  the  renaming  of  bound 
variables  (a-con version)." 

Higher-order  unification.  Unification  is  the  process  of  producing  a  common  instance 
from  two  or  more  terms  by  instantiating  either  term’s  free  variables  with  other  terms.3 
We  use  the  AProlog  notation  M  =  N  to  indicate  that  the  A-terms  M  and  N  are  to  be 
unified.  When  unifying  terms,  we  are  typically  interested  in  the  most  general  unifier  (MGU); 
for  example,  the  MGU  of  pi  and  p y  is  simply  (x  =  y),  rather  than  the  overly  specific 
(1  =  a,  y  =  a).  (We  shall  continue  to  use  ()  to  enclose  unification  constraints.) 

Unification  underlies  the  logic  programming  paradigm,  but  because  AProlog  terms  are  A-terms, 
AProlog  unification  must  be  higher-order  —  i.e.,  it  must  support  the  instantiation  of  vari¬ 
ables  to  functions  as  well  as  to  first-order  constants.  Terms  of  the  A-calculi,  however,  do 
not  admit  unique  most  general  unifiers:  consider  that  the  unification  of  / a  =  caa  allows 
the  variable  /  to  be  instantiated  with  any  of  Ax.caa,  Ai.cxa,  Ax.cax,  or  Ax.cxx,  none  of 
which  is  an  instance  of  another  (they  are  all  closed).  Thus,  higher-order  unification  is  inher¬ 
ently  nondeterministic.  Even  worse,  Goldfarb  shows  that  higher-order  (and  in  particular, 
second-order)  unification  is  undecidable  [49].  However,  a  semi-decision  procedure  effective 
in  practice  is  presented  by  Huet  [66]  and  extended  by  Elliott  [39]. 


2Unlike  /^conversion,  o-conversion  is  ‘nondirectional.’  Hence,  in  order  to  avoid  expensive  a-equivalence 
tests,  it  is  necessary  to  employ  representations  that  do  not  explicitly  name  bound  variables  —  e.g.,  de  Bruijn 
indices  [22]. 

3Knight  presents  an  overview  of  unification  research  in  [73].  For  a  formal  treatment  of  first-order  unifi¬ 
cation,  see  Lassez,  Maher  Sc  Marriott  [76]. 
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3.3  Higher-order  Language 


A  domain  is  said  to  be  higher-order  if  it  contains  higher-order  values  —  that  is,  values  which 
take  other  values  as  arguments  ( e.g .,  functions  and  predicates).  For  instance,  the  values 
manipulated  by  a  higher-order  programming  language  include  functions.  (By  ‘manipulated’ 
we  mean  that  functions  axe  ‘first-class’  objects  —  i.e.,  they  can  be  bound  to  variables,  passed 
as  parameters,  and  returned  from  function  calls.)  Similarly,  within  a  higher-order  logic,  the 
values  that  can  be  quantified  include  functions  and  predicates. 

On  the  other  hand,  we  consider  a  representation  language  to  be  higher-order  if  it  contains 
a  means  for  expressing  argument  binding:  for  example,  the  A  of  A-calculus  or  lambda  in 
LISP.  Such  languages  are  particularly  amenable  to  representing  the  values  of  higher-order 
domains,  since  the  formulation  of  higher-order  objects  can  be  expressed  with  A.  We  follow 
common  practice  in  overloading  the  term  ‘higher-order’  by  applying  it  to  values  and  domains 
(semantic  entities),  as  well  as  languages  (syntactic  entities). 

Higher-order  domains.  Many  domains  naturally  involve  binding  constructs,  and  are 
thus  best  represented  within  higher-order  languages:  logics,  programming  languages,  and 
natural  language  [106,  88,  85,  103].  This  same  need  for  higher-order  representation  also 
arises  when  one  wants  to  reason  ‘at  the  meta-level’  —  that  is,  about  AProlog  programs.  One 
would  like  facts  (propositions)  or  properties  (predicates)  to  be  objects  themselves.  Prolog 
and  other  first-order  representation  languages  allow  this  to  some  extent,  but  in  a  way  that 
is  only  operationally,  but  not  logically  motivated.  AProlog,  on  the  other  hand,  facilitates 
higher-order  programming  —  that  is,  the  ability  to  create  goals  and  programs,  and  pass 
them  as  arguments. 

Binding  operators.  Within  a  higher-order  language,  binding  operators  are  typically 
implemented  via  a  single  primitive  such  as  A.  For  example,  the  function  f(x)  =  2*x 
might  be  represented  simply  as  /  =  Ax. 2  *  i.  Similarly,  Vx.3y.  x<y  might  be  expressed 
as  pi  (Ax.  sigma  (Ay.  x<y)).  (In  fact,  this  is  the  representation  used  within  AProlog;  the 
former  is  simply  a  more  readable  abbreviation.)  The  implementation  of  other  binding  op¬ 
erators  in  terms  of  A  allows  a(3r]-con version  and  A-term  unification  to  be  implemented  once 
within  the  representation  language  rather  than  within  individual  client  programs  [106,  60]. 
Relegating  such  tasks  to  the  representation  language  makes  for  more  succinct,  elegant  pro¬ 
grams. 


Examples  of  AProlog.  The  higher-order  predicate  select  of  type 
type  select  (A  — ►  o)  — ♦  list  A  — ►  list  A  — ♦  o. 
may  be  encoded  as 

select  P  (x  ::  K)  (x  ::  L)  •<=  P  x,  select  P  K  L. 

select  P  (x  ::  K)  L  <$=  select  P  K  L. 

select  P  nil  nil. 
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(As  in  Prolog  denotes  conjunction.  The  symbol  <=  represents  implication,  and  is  equiva¬ 
lent  to  Prolog’s  Finally,  ::  stands  for  the  cons  operation  of  LISP.)  select  P  K  L  insures 
that  L  is  a  sublist  of  K  for  which  P  holds.  The  following  query,  for  example,  selects  the 
grandparents  from  the  given  input  list: 

?  —  select  (Ax.3 y.  parent  y  x,  3 z.  parent  z  y )  (tom  ::  kate  ::  van  ::  leo  ::  nil)  L. 

(To  minimize  the  need  for  parentheses,  A x,A  x,  3y.  B  x  y  is  parsed  as  Ax. (A  x,  3y.  ( B  x  y)); 
that  is,  ‘.’  binds  very  weakly  to  the  right.) 

Readers  may  argue  that  select  could  be  formulated  within  Prolog  simply  by  replacing  P  x 
with  apply  P  x,  and  further,  that  grandparent  could  itself  be  encoded  as  a  top-level  Prolog 
predicate: 

grandparent  x  <=  parent  y  x,  parent  z  y. 

In  many  situations,  however,  the  ‘inline’  expression  of  higher-order  arguments  (such  as  the 
unnamed  grandparent)  is  either  necessary  or  desirable:  for  instance,  reformulation  as  a 
clause  is  not  applicable  to  higher-order  functions,  which  are  not  predicates  of  type  o.  More¬ 
over,  first-order  languages  do  not  afford  many  operations  over  predicates,  such  as  composi¬ 
tion: 

select-or  P  Q  K  L  <=  select  (A x.Px\Qx)  K  L. 

(where  the  operator  ‘j’  represents  inclusive  ‘or’.)  Within  Prolog,  select_or  cannot  be  pro¬ 
grammed  in  terms  of  select  (at  least  not  without  more  complicated  list  operations);  the 
closest  approximation  is  the  inequivalent 

select-or  P  Q  K  L  <=  select  P  K  L. 

8elect_or  P  Q  K  L  <=  select  Q  K  L. 


First  vs.  higher-order.  When  higher-order  values  are  represented  with  first-order  terms, 
we  often  need  ‘new  variables’,  need  to  check  conditions  such  as  ‘where  x  does  not  occur  in 
M’,  or  must  implement  substitution  in  a  way  that  ‘renames  bound  variables  if  necessary.’ 
In  addition,  procedures  that  depend  upon  the  binding  operator  —  e.g.,  a /^-conversion  and 
higher-order  unification  —  must  be  explicitly  programmed.  We  claim  that  all  this  makes  for 
a  prohibitively  complex  encoding. 

To  justify  such  a  conjecture,  one  might  attempt  to  formalize  a  given  higher-order  example 
within  a  first-order  language.  However,  at  best  such  a  strategy  could  only  establish  the 
inadequacy  of  one  particular  formulation.  Arguing  that  first-order  encodings  are  generally 
insufficient  for  higher-order  domains  is  more  problematic,  because  first-order  languages  cer¬ 
tainly  are  expressively  sufficient  (being  Turing  equivalent).  The  proper  question  is,  rather, 
one  of  expressive  power  and  elegance  —  that  is,  are  first-order  languages  sufficient  to  con¬ 
cisely  and  cleanly  program  over  higher-order  domains?  While  the  issue  remains  open,  as 
there  exist  no  established  criteria  for  making  a  determination,  we  remain  convinced  that 
higher-order  expressivity  is  crucial  for  many  higher-order  domains. 


Performance.  The  extent  of  the  overhead  incurred  through  higher-order  operations  re¬ 
mains  unclear.  On  the  one  hand,  implementation  of  at/^-conversion  and  higher-order  unifi¬ 
cation  within  the  programming  language  is  generally  more  efficient  than  user-programmed 
encodings,  since  an  extra  layer  of  language  is  avoided.  On  the  other  hand,  the  full  power 
of  undecidable  higher-order  unification  is  potentially  too  costly.  Yet  Huet’s  semi-decision 
procedure  is  effective  in  practice,  simply  because  typical  applications  of  A-term  unification 
tire  more  restricted  than  the  worst  case. 

A  subset  of  AProlog  named  L\  is  currently  being  developed  by  Miller  [84].  L\  restricts  higher- 
order  unification  to  maintain  the  attractive  properties  of  first-order  unification:  namely  de¬ 
cidability  and  most  general  unifiers.  The  overhead  of  L\  s  restricted  higher-order  unification 
is  not  significantly  different  than  that  of  first-order.  We  further  discuss  L\  and  its  relevance 
to  this  thesis  in  §10.2.3. 


3.4  AProlog 

3.4.1  Clauses  and  Goals 

Now  we  turn  to  the  logical  connectives  of  the  language.  AProlog  expressions  are  distinguished 
based  upon  whether  they  appear  as  a  goal  G  (i.e.,  a  query)  or  a  program  clause  D  (i.e.,  a 
rule  or  fact),  which  are  each  of  proposition  type  o.  For  Prolog,  these  two  classes  may  be 
inductively  defined  as 

G  ::=  true  |  A  |  Gi  ,  G2  \  G\  ;  G2 
D  true  |  A  |  A  4=  G 

where  G  ranges  over  goals  (also  termed  G-formulas  or  G-forms),  D  over  program  clauses  (or 
ZMorms),  and  A  over  atoms.  Atoms  are  propositional  (in  the  Prolog  case,  Herbrand)  terms 
of  type  o  that  do  not  have  a  logical  operator  at  the  top-level:  that  is,  c  {A  ,  B)  is  atomic, 
while  (c  A  ,  B)  is  not.  By  Prolog  convention,  variables  within  D  are  implicitly  universal, 
while  those  within  G  are  implicitly  existential. 

The  above  characterizes  the  Horn  clauses  [124],  the  logical  basis  of  Prolog.  Horn  clauses 
may  be  generalized  to  higher-order  Horn  logic  as  follows: 

G  ::=  true  |  A  |  G,  ,  G2  |  Gj  ;  G2  \  3x  [:  r],  G 

D  ::=  true  |  A  |  Dj  ,  D2  |  D  *=  G  |  Vx  [:  r].  D 

where  we  have  replaced  Prolog’s  Herbrand  terms  with  AProlog’s  simply-typed  A-terms. 
Atoms  now  take  the  form  p  M\  M2--Mn,  where  p  is  a  predicate  constant  and  M\  M2-.Mn 
are  its  A-term  arguments  (although  /^-reduction  may  be  required  to  reach  this  form). 

AProlog  is  based  upon  the  following  generalization  of  higher-order  Horn  clauses: 

::=  true  |  A  |  Gi  ,  G2  |  Gx  ;  G2  |  D  =►  G  |  Vx  [:  r].  G  |  3x  [:  r],  G 

::=  true  |  A  \  D\  ,  D2  |  D  4=  G  |  Vz  [:  r].  D 


G 

D 
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Both  =>  and  <=  represent  (intuitionistic)  implication.  Thus  G  =>  D  and  D  <=  G  stand  for 
the  same  formula,  where  the  latter  is  equivalent  to  Prolog’s  D  G.  Typically,  a  goal  will 
be  written  as  Z?  =»  (7,  while  a  clause  will  be  written  a s  D  <=  G. 

The  above  classes  define  the  core  of  AProlog —  the  higher-order  Hereditary  Harrop  for¬ 
mulas  [100],  which  generalize  Horn  clauses  while  preserving  the  basic  character  of  a  logic 
programming  language.  The  logical  operators  have  the  following  meanings  and  types  (simple 
types  are  also  employed  to  type  logical  expressions). 


1 

‘and’ 

o  — »  o  — ►  o 

> 

‘or’  (inclusive) 

o 

1 

o 

i 

o 

‘implies’ 

o  — ►  o  — ►  o 

V 

‘for  all’ 

(A  — ►  o)  — ♦ 

3 

‘for  some’ 

(A  — ►  o)  — ► 

The  types  for  Vx.  P  and  3x.  Q  arise  from  their  respective  representation  as  pi  Ax.  P  and 
sigma  A y.  Q. 


3.4.2  An  Abstract  Interpreter  for  AProlog 


So  that  readers  may  arrive  at  a  better  understanding  of  AProlog,  we  herein  provide  an 
informal  operational  characterization  of  the  language.4  Then  within  §3.7,  we  offer  a  formal 
inference  system. 


Notation:  First,  we  use  V  to  denote  an  arbitrary  logic  program  (set  of  D's),  Next,  V  t-  G 
denotes  that  G  is  a  logical  consequence  of  V  —  i.e.,  that  G  follows  (in  the  intuitionistic  sense 
of  logic  programming)  from  V.  In  order  to  speak  about  the  state  of  the  logic  programming 
interpreter,  we  use  V  I-  G  to  represent  the  problem  of  solving  G  given  the  program  V. 
Thus,  while  V  G  expresses  that  G  is  logically  valid  given  V,  V  H~  G  denotes  that  under 
a  particular  interpretation  (i.e.,  using  a  particular  operational  procedure  for  finding  logic 
programming  proofs)  G  is  derivable  from  V. 

The  table  below  contains  a  AProlog  abstract  interpreter,  which  consists  of  a  series  of  backchain- 
ing  search  steps  that  follow  the  structure  of  the  pending  goal  G. 


true 

and 

or 

augment 

instance 

generic 

backchain 


V  It-  true 

V  IP  G2  ,  G2 
VbG^-,  G2 
Vb  D  =$■  G 
VbBx.G 

V  lb  Vx.  G 
Vb  A 


only  if  V  b  G\  and  V  H~  G2. 

only  if  V  h  G\  or  V  I-  G2. 

only  if  {D}(JVbG. 

only  if  V  b  [M/x]G  for  some  A-term  M. 

only  if  V  h  [c  jx\G  for  a  new  constant  c. 

only  if  (VA\  Ax  <=  Gx)  €  V ,  Ax  unifies  with  A ,  and  V  b  G\. 


4Our  AProlog  abstract  interpreter  is  a  slightly  modified  version  of  Nadathur  &  Miller’s  [100,  p.813]. 
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where  X  is  a  set  of  variables,  VX  represents  quantification  over  that  set,  and  Mx  denotes  a 
A- term  potentially  containing  free  occurrences  of  X. 

For  the  sake  of  intelligibility,  the  abstract  interpreter  above  ignores  details  essential  for  both 
correctness  and  efficiency: 

•  Order  of  evaluation  —  Conjunctive  and  disjunctive  goals  are  ‘evaluated’  from  left  to 
right  in  a  depth-first  fashion.  For  the  or  case,  the  right  branch  need  only  be  solved 
if  the  left  fails  (i.e.,  is  not  satisfiable);  for  and,  the  right  branch  is  only  checked  if 
solution  of  the  left  succeeds. 

•  Type  information  —  To  insure  that  A-terms  within  G  are  properly  typed,  a  type  sig¬ 
nature  S  must  be  generated  for  a  given  V.  £  is  extended  as  required  by  the  augment, 
instance  and  generic  operations. 

•  Logical  variables  —  The  instance  strategy  is  realized  by  substituting  a  new  logical 
variable  of  the  appropriate  type  for  the  unknown  A-term.  Subsequent  computation 
may  then  fill  in  the  unknown  value  (via  unification). 

•  Clause  normal-form  —  The  reader  may  have  noticed  that  the  backchain,  or  rule 
application,  step  relies  upon  clauses  of  the  form  VA\  Ax  <=  Gx,  rather  than  the  general 
form  for  D's  given  by  the  inductive  definition.  In  §3.6  we  illustrate  a  transformation  nf 
that  maps  arbitrary  D-forms  into  the  normal-form  Dn f  —  a  conjunction  of  universally 
quantified  rules  of  the  form  A  <=  G.  More  formally,  the  structure  of  D„f  is  defined  as 
follows: 

Onf  ::=  Dy  |  Dnf  ,  Dn f 
Dy  ::=  |  Vx.  Dy 

::=  A  <=  G 

The  program  V  may  in  this  way  be  mapped  to  a  set  of  clauses  of  the  form  VX.  Ax  <=  Gx- 
(An  atomic  clause  A  becomes  A  <=  true.)  Hence  the  test  VX.  Ax  <=  Gx  €  V  is  cor¬ 
rectly  expressed  as  VX.  Ax  <=  Gx  €  nf(P). 

AProlog  follows  the  Prolog  convention  of  universally  closing  (i.e.,  implicitly  universally 
quantifying  the  free  variables  within)  the  ZMorms  of  the  original  program.  Clauses 
added  to  V  via  the  augment  step,  however,  remain  open.  Thus,  V  \ -  G  potentially 
alters  V  by  instantiating  free  variables  of  V  in  the  course  of  solving  G:  for  example, 
the  AProlog  query 

?-  pz=>(pl,p2). 

fails,  as  x  is  instantiated  in  the  course  of  solving  p  1. 

•  Clause  unification  —  The  backchain  step  is  implemented  by  unifying  the  pending 
atomic  goal  A  with  the  head  of  a  potentially  applicable  clause  VA\  Ax  <=  Gx-  Like 
Prolog,  this  unification  is  accomplished  by  replacing  the  clause’s  universally  quantified 
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variables  X  with  new  logical  variables  y  of  the  appropriate  type.  If  Ay  =  A,  the  inter¬ 
preter  attempts  the  solution  of  V  h  Gy,  where  Gy  is  typically  partially  instantiated 
by  the  unification  of  A  and  Ay. 

•  Deterministic  clause  selection  —  Finally,  because  the  backchain  strategy  must  have 
a  means  to  enumerate  clauses  effectively,  the  logic  program  V  is  in  reality  a  list  rather 
than  a  set  of  D-forms.  (The  additions  to  V  made  through  the  augment  step  are 
effectively  inserted  at  the  head  of  this  list.)  A  depth-first  backtracking  strategy  is  then 
taken  to  search  for  proofs  of  a  given  query. 

•  Atoms  with  variable  heads  —  The  higher-order  Hereditary  Harrop  formulas  disallow 
certain  atoms  (those  occurring  in  negative  positions)  from  being  a  variable  predi¬ 
cate  [100].  For  example,  the  goal  xM\..Mn  =>  G  is  prohibited,  because  its  solution 
could  result  in  the  assumption  of  a  clause  with  a  variable  head  (which  would  then  be 
universally  applicable).  This  restriction  is  not,  however,  strictly  enforced  by  our  ab¬ 
stract  interpreter,  nor  for  that  matter  within  AProlog.  Instead,  goals  and  clauses  may 
contain  a  variable  predicate  such  as  the  assumption  xM\..Mn,  so  long  as  a;  is  instanti¬ 
ated  before  the  interpreter  attempts  to  solve  a  goal  xM\..Mn  or  assume  a  clause  with 
head  xM\..Mn-  This  generalization  is  supported  by  current  AProlog  implementations. 


3.4.3  AProlog  Implementations. 

This  dissertation  employs  eLP,  an  implementation  of  AProlog  developed  by  Conal  Elliott  and 
Frank  Pfenning  in  the  framework  of  the  Ergo  project  at  Carnegie  Mellon  University  [38].  eLP 
is  written  in  COMMON  LlSP  and  relies  upon  syntax  tools  of  the  Ergo  Support  System  [77]. 
The  examples  presented  within  this  thesis  have  each  been  run  under  eLP  (or  under  an 
extended  version  developed  herein). 

A  substantially  more  efficient  implementation  of  AProlog  within  Standard  ML  is  currently 
being  completed  by  Pfenning.  Yet  another  new  AProlog  implementation  is  being  developed 
by  Pascal  Brisset  at  IRISA  in  France  (brissetCirisa.fr). 


3.5  Example:  Lists  &;  Mapping 

In  the  next  two  sections,  we  present  two  more-extended  examples  of  AProlog.  The  latter 
one,  in  particular,  will  be  relevant  to  subsequent  discussion. 

The  module  within  Figure  3.1  implements  three  simple  operations  over  lists.  List  terms 
are  built  with  the  constructors  nil  and  ::  (or  ‘cons’).  Simple  types  for  lists  are  produced 
by  the  list  type  constructor:  for  example,  list  int  is  the  type  of  integer  lists,  while  list  A 
stands  for  homogeneous  lists  of  an  arbitrary  type  A.  (Note  that  the  latter  does  not  admit 
inhomogeneous  lists  —  i.e.,  lists  with  elements  of  more  than  one  type.) 

The  member  function  determines  whether  its  first  argument  (of  type  A)  occurs  within  its 
second  argument  (of  type  list  A).  The  equivalence  test  employed  by  member  is  A-term 
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module  lists. 

kind  list  type  — ♦  type. 

type  nil  list  A. 

type  A  — *  list  A  — ►  list  A. 

type  member  A  — ►  list  A  — *■  o. 

type  append  list  A  — ►  list  A  — ►  list  A  — » o. 

type  nth  int  — ►  A  — >  list  A  — *  o. 

member  x  (x ::  L)  <=  !. 

member  x  ( y ::  L)  <=  member  x  L. 

append  nil  K  K. 

append  (x  ::  L)  K  (x  ::  M)  <=  append  L  K  M. 

nth  0  x  (y ::  L)  <=  !,  x  =  y. 

nth  n  x  ( y::L )  <=  m  is  n  —  1,  nth  m  x  L. 


Figure  3.1:  List  functions 


unification.  Cut  (!)  is  a  special  side-effecting  goal  that  commits  the  interpreter  to  all  choices 
made  since  the  selection  of  the  clause  containing  the  cut.5,6  In  member,  !  prohibits  looking 
further  in  the  list  once  the  given  A-term  has  been  found.  Herein  then,  !  is  used  primarily  to 
improve  efficiency  rather  than  to  change  behavior. 

The  append  predicate  determines  whether  its  first  two  list  arguments  may  be  appended  to 
yield  its  third.  Since  append  is  defined  as  a  predicate  (in  the  ‘logic  programming  style’) 
rather  than  a  function,  it  may  be  invoked  with  all  arguments  instantiated  (as  a  ‘check’),  or 
with  any  pair  instantiated,  in  which  case  append  determines  whether  there  exists  a  third 
list  such  that  the  ‘append’  relationship  holds. 

Finally,  nth  finds  the  nth  element  of  a  list.7  The  reason  the  first  clause  takes  the  given  form 
rather  than 

nth  0  x  (x ::  L)  <=  !. 

is  that  the  latter  would  continue  down  the  list  past  the  nth  element  y  in  the  case  that  x  ^  y. 

The  lists  module  could  well  have  been  presented  within  Prolog,  as  it  does  not  make  use 
of  AProlog’s  higher-order  features.  The  same  cannot  quite  be  said  of  the  mapping  func¬ 
tions  contained  within  Figure  3.2.  Of  particular  interest  is  the  application  of  predicate  P 

8This  includes  AProlog’s  selection  of  alternative  higher-order  unifiers. 

6For  a  thorough  description  of  cut,  see  Sterling  Sc  Shapiro  [124]. 

7The  is  construct  represents  assignment:  For  m  is  n  -  1,  m  is  set  to  the  numeric  value  that  results  from 
evaluating  n  —  1.  is  differs  from  ’=’  in  that  the  latter  does  not  evaluate  it  arguments:  m  =  n  —  1  sets  m  to 
be  the  symbolic  expression  n  -  1 . 
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module  maps, 
import  lists. 

type  map -fun  (A  — ♦  B)  — »  (list  A)  — ►  (list  B)  — ►  o. 

type  map-pred  (j4  — »  B  — *  o)  — ♦  (list  .4)  — ♦  (list  B)  — *  o. 

type  reduce  ( A  — »  B  — ►  B)  — ♦  (list  A)  — »  B  — *■  B  — ♦  o. 

type  for-every  (4  — ►  o)  — ►  (list  A)  — ►  o. 

type  for-some  (A  — *■ o)  — ►  (list  A)  — *•  o. 

map-fun  /  nil  nil. 

map-fun  /  (x::L)  ((fx)::K)  4=  map _fun  /  L  K. 
map-pred  P  nil  nil. 

map-pred  P  (x  ::  L)  ( y::K )  4=  Pxy,  map-pred  P  L  K. 
reduce  /  nil  x  x. 

reduce  /  (u  ::L)  x  ( fuy )  <=  reduce  /  L  x  y. 

for^every  P  nil. 

for^every  P  (x  ::  L)  <=  P  x,  for-every  P  L. 

for-some  P  (x::L)  <=  Px. 
for-some  P  {x ::  L)  <=  for-some  P  L. 


Figure  3.2:  Mapping  functions 


in  map_pred,  for_every  and  for-some,  without  Prolog’s  somewhat  encumbering  apply 
syntax.  It  is  also  important  to  note  that  the  function  /  in  map_fun  and  reduce  is  simply  a 
A-term;  /  is  not  a  named  procedure  in  the  traditional  sense  of  Prolog  and  other  languages. 
(In  higher-order  languages  such  as  AProlog,  functions  can  be  explicitly  constructed;  within 
first-order  languages,  functions  can  only  be  simulated  as  in  apply  p  x .) 


3.6  Example:  Clause  Normal-Form 

The  nform  module  of  Figure  3.3  transforms  arbitrary  D- forms  into  the  more  restricted  £>n f 
given  in  §3.4.2.  Recall  that  Dn f  is  defined  as 


A.f 

::=  Dv 

-^nf  >  ^nf 

Dv 

::= 

|  Vx.  A, 

D <. 

::=  A  <= 

G 

The  nf  mapping  is  required  by  the  abstract  interpreter  presented  in  §3.4.2.  In  fact,  nf  is 
used  by  the  eLP  implementation  for  the  clauses  of  the  initial  logic  program  V,  as  well  as 
the  clauses  added  to  V  in  the  course  of  solving  goals  of  the  form  D  =>  G.  Moreover,  when 
we  later  extend  AProlog  with  additional  logical  connectives,  nf  will  serve  as  the  basis  of  a 
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new  clause  normal-form.  Finally,  nform  is  illustrative  of  D-form  and  G-form  manipulation 
in  general,  a  recurring  theme  within  more  complex  AProlog  applications  to  come. 

This  normal-form  conversion  is  captured  by  the  following  distributive  transformations  over 
D-forms: 

(1)  Vx.  (Dix  ,  D2x)  =>  (Vx.  D\x) ,  (Vx.  D2x) 

(2)  {Di  ,  D2)  <=  G  =*  (D1<=G),(D2^G) 

(3)  (Vx.  Dx )  4=  G  =$■  Vx.  (Dx  G)  (provided  x  not  free  in  G). 

(4)  (D<=Ga)^G2  =4>  D  <=  (Gi  ,  G2) 

Since  Dn f  requires  conjunctions  to  be  at  the  top-level,  transformations  (1)  and  (2)  redis¬ 
tribute  4=  and  V  inside  of  Similarly,  (3)  redistributes  <=  inside  of  V.  Finally,  (4) 
collapses  nested  implications.  Appropriate  sequences  of  the  above  transformations  map  ar¬ 
bitrary  D-forms  to  Dnf-forms.  The  proof  is  by  induction  over  cases. 

Before  further  exploring  these  transformations,  there  axe  two  points  to  reinforce  concerning 
AProlog’s  higher-order  notation.  First,  when  a  bound  variable  x  is  not  included  as  a  potential 
argument  to  a  variable  function  G  as  in  Vx.  (Dx  ^  G),  then  x  is  not  permitted  to  appear 
free  within  G;  that  is,  the  free  variable  restriction  following  (3)  is  already  captured  within 
the  notation.  Second,  since  Vx.  Dx  is  represented  as  V  (Ax.Dx)  where  D  is  itself  necessarily 
a  A  expression,  it  makes  since  to  express  the  whole  simply  as  V  D. 

For  example,  through  the  above  transformations  the  following  D-forms  on  the  left  are 
mapped  to  the  normal  D-forms  on  the  right: 

q  q  <=  true 

p  =►  r  =►  q  q  <=  (p,  r) 

p  =>  Vx.  (r  x  =*  q)  Vx.  q  •<=  (p,  r  x) 

p  =>  Vx.  (r  x  =>  (q,  s  x))  Vx.  q  <=  (p,  r  x), 

Vx.  sx^=  (p,  rx) 

Within  the  code,  ndform  is  the  top-level  routine;  requantify  redistributes  a  universal 
quantification  ‘below’  top-level  conjunctions;  and  conjoin  redistributes  <=  below  and  V, 
by  adding  the  subgoal  G  to  the  preconditions  of  each  nested  D-form. 

It  is  important  to  recognize  that  V  is  used  both  as  a  data  constructor  to  encode  D-forms, 
and  as  a  logical  connective  —  for  example,  the  Vx  within 

conjoin  G  (VD)  (VD')  <=  Vx.  conjoin  G  {Dx)  {D'x) 

If  instead  the  above  were  simply 

conjoin  G  (VD)  (VD')  •<=  conjoin  G  {Dx)  {D'x) 

the  variable  x  could  be  instantiated  in  the  course  of  the  computation.  The  explicit  quantifi¬ 
cation  acts  as  a  guard  ensuring  that  x  remain  universal. 
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module 

nform. 

type 

ndform 

o  — ►  o  — ►  o. 

type 

requantify 

o  — ►  o  — ►  o. 

type 

conjoin 

o  — ►  o  — ►  o 

requantify  (Vx.  Dtx  ,  D2x)  (D{  ,  D’2) 
requantify  D  D. 


!,  requantify  (Vx.  Z?i  x)  D[ , 
requantify  (Vx.  D^x)  D'2 


conjoin  G 

conjoin  G 
conjoin  G 
conjoin  G 


(Z?i  ,  Z?2)  (D[  ,  D'2)  <=  conjoin  G  D\  D[, 

conjoin  G  D2  D2. 

(VD)  (VZ?')  <=  Vx.  conjoin  G  (Z?x)  (D'x). 

04-4=  true)  (A<=G). 

(A  4=  Gx)  (A*(G,  G,)). 


ndform 

(A  ,  A) 

(A ,  A) 

ndform 

(VZ?) 

D" 

ndform 

(D  4=  G) 

D" 

ndform 

A 

(A  4=  true). 

■4=  !,  ndform  D\  D{ , 

ndform  D2  D 2. 

•4=  !,  (Vx.  ndform  (Z?x)  (D'x)), 
requantify  (VZ?')  D". 

<=  !,  ndform  D  D', 

conjoin  G  D'  D". 


Figure  3.3:  Clause  normal-form  conversion. 
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3.7  Inference  System  for  AProlog 


We  may  formalize  the  operational  interpretation  of  AProlog  given  in  §3.4.2  within  an  infer¬ 
ence  system.  Our  justification  for  doing  this  is  twofold:  (1)  formed  inference  rules  clarify  the 
preceding  informal  description,  and  (2)  such  a  system  may  be  employed  to  prove  properties 
of  AProlog.  The  latter  justification  comes  out  of  our  need  to  establish  the  validity  of  AProlog 
extensions  we  propose  in  Chapter  5.  The  casual  reader  may  skip  this  section,  although  the 
more  formal  discussions  of  Chapter  5  are  then  likely  to  be  impenetrable. 

Substitutions.  A  substitution  is  a  mapping  from  variables  to  A-terms.  We  represent  the 
application  of  a  substitution  9  to  an  expression  M  (containing  free  variables)  as  0M,  and 
we  use  ip0  for  the  composition  of  substitutions  ip  o  6.  Substitution  binds  less  tightly  than 
A-term  application;  that  is,  6MN  =  9(MN).  Substitutions  are  applied  to  programs  as  well 
as  to  goals,  because,  as  previously  mentioned,  V  may  contain  free  variables  that  become 
instantiated  in  the  course  of  goal  solution. 

Recall  the  A-term  reduction  rules  defined  in  §3.2: 

(A x.M)N  =►„  [N/x]M 

A x.Mx  =$-v  M  provided  x  not  free  in  M 

As  was  stated  then,  for  the  simply-typed  A-calculus,  maximal  sequences  of  j3  and  77  reductions 
applied  to  a  given  A-term  terminate  with  a  unique  A-term  said  to  be  in  /^-normal  form.  For 
the  remainder  of  this  discussion,  we  assume  that  all  A-terms  are  /^-normal.  This  means 
that  substitution  0M  and  A-term  application  MN  are  followed  by  /^-conversion  to  normal 
form. 

Further  notation:  0\x  means  6  ‘without’  x  (i.e.,  0\x(x)  =  x).  Also,  free(M)  denotes  the 
free  variables  of  A-term  M,  dom(0)  represents  the  set  of  variables  bound  by  substitution  9, 
and  ran(0)  is  the  set  of  A-terms  to  which  substitution  9  maps  dom(0).  One  substitution  ip 
is  said  to  be  an  instance  of  another  0  if  for  all  M,  there  exists  a  further  substitution  a  such 
that  ipM  =  aOM  —  i.e.,  0  is  more  general  (or  less  defined)  than  ip). 

Finally,  we  say  that  a  given  substitution  <7  is  minimal ,  or  most  general ,  with  respect  to  a 
particular  set  of  conditions,  if  9  satisfies  those  conditions,  and  if  for  any  other  substitution 
ip  also  satisfying  those  conditions,  ip  is  a  instance  of  9. 


The  inference  system.  In  order  to  more  accurately  speak  about  the  state  of  a  logic 
programming  interpretation,  we  use  V  \-e  G  to  denote  that  G  may  be  solved  under  V  with 
a  particular  substitution  6.  Figure  3.4  contains  a  list  of  inference  rules  formally  defining  the 
I-  relation.  Following  the  structure  of  the  abstract  interpreter  presented  within  §3.4.2,  the 
inference  system  is  ‘goal-based’  —  the  solution  of  a  particular  G-form  (below  the  line)  is 
reduced  to  the  solution  of  one  or  more  subgoals  (above  the  line).  For  example,  the  rule 

V  Gx  9V  I-*  9G2 

V  G\  ,  G2 
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V  I-  true 


V  t-0  Gi  9V  l-v-  9Gi 


V  1 -gf  G\  ,  G2 

V  trg  Gx 

V\ -g  G2 

V  bg  Gi  ;  G2 

VI ~g  Gi  ;  G2 

Vbg  Gy 

V\rg  Vx :  r.  Gx 
and  y  #  dom(0) 

where  y  £  free(ffG),  y  £  f ree(9V), 

Vhg  Gy 

V  l"5\y  3 x:t.  Gx  where  y  free(0G),  and  y  $  free(0P). 

{£»}  U  V\-eG 
VV-g  D  =>  G 

(VX.  Ax  <=  Gx)  €  nf(V)  9axAx  =  9 A  9V  9crxGx 

V  I ~^g  A 

where  dom(ax )  Q  X, 

dom(0)  H  X  =  <b, 

and  ax  &  9  are  most  general. 

Figure  3.4:  Inference  rules  for  AProlog. 

describes  the  reduction  of  a  conjunctive  goal  ( Gi  ,  G2)  into  two  subgoals,  where  G\  is  solved 
with  substitution  6 ,  and  then  6  is  applied  in  the  solution  of  G2.  Similarly, 

V  *~tf  Gy 

V  V-g  Vx:r.  Gx  where  y  £  free(0G),  y  fr ee(9V),  and  y  &  dom(0) 

specifies  that  Vx:t.  Gx  is  reduced  to  solving  Gy  for  any  y  (since  y  is  not  bound  by  any 
substitution  and  does  not  appear  free  in  V). 

While  V  V-g  G  pertains  to  the  particular  inference  system  being  defined,  we  use  V  h  G  to 
instead  denote  that  that  there  generally  exists  an  intuitionistic  proof  of  G  given  V.  Logic 
programming  is  intuitionistic  or  constructive  in  nature  [87,  86]:  for  example,  V  \ -  Gy  ;  G2 
only  if  either  V  I-  G\  or  V  F  G2.  This  disallows  the  derivation  of  classical  tautologies  such  as 
#-  p  ;  -«p  and  I-  (p  ^  q)  =$■  (~<p  ;  q).  (The  ->  operator  stands  for  logical  negation,  which  is 
not  currently  supported  by  AProlog.)  The  restriction  to  an  intuitionistic  system  (as  opposed 
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to  a  classical  one)  is  so  that  the  logic  admits  an  effective  proof  procedure  such  as  IK  In  fact, 
D-forms  are  restricted  from  containing  disjunction  (;)  and  existential  quantification  (3), 
precisely  because  of  the  difficulty  in  giving  an  operational  interpretation  to  such  clauses.8 

3.7.1  Soundness 

The  inference  system  I-  defined  within  Figure  3.4  is  correct,  or  sound ,  in  that  only  valid 
goals  may  be  inferred:  that  is,  V  bg  G  entails  6V  b  BG.  The  soundness  of  I-  may  be  proved 
by  a  straightforward  induction  (albeit  tedious)  over  an  arbitrary  derivation  V  bg  G.  More 
precisely,  given  V  h#  G,  we  seek  to  show  that  6V  b  BG  by  induction  over  the  sequence  of 
inference  steps  used  to  establish  G :9 

Basis. 

Trivially,  V  lbe  true  implies  BV  b  true. 


Induction  step:  if  V  bg  G,  then  BV  b  BG.  Proof  is  by  cases. 

Given  V  G\  ,  G2. 

By  definition,  V  I ~g  Gi  and  BV  1-^,  BG 2. 

From  the  ind.hyp.  (t'.e.,  the  induction  hypothesis),  BV  b  BG\  and  4>BV  b  4>BG2. 

Since  ipBV  b  ipBGi  follows  from  BV  b  BG\,  rJ>0V  b  (ipBG\  ,  xl>BG2). 

Hence  tpBV  b  ipB(Gi  >  G2). 

Given  V  b$  Gi  ;  G2 . 

By  definition,  V  bg  Gi  or  Vbg  G2. 

From  the  ind.hyp .,  BV  b  BG\  or  BV  b  8G2. 

Hence  BV  b  ( BGi  ;  BG2),  and  then  BV  b  0{G\  ;  G2). 

Given  V  bg  Vx.  Gx. 

By  definition,  V  lb*  Gy  where  y  £  free(0G),  y  free(fl'P),  and  y  £  dom(0). 

From  the  ind.hyp.,  BV  b  BGy. 

Since  y  #  dom(0),  B(Gy)  =  ( BG)y . 

By  universal  generalization  (applicable  because  of  restrictions), 

8For  Horn  logic  and  even  higher-order  Horn  logic,  intuitionistic  and  classical  provability  coincide  [86],  so 
the  by-word  ‘intuitionistic’  is  only  important  for  logics  extended  with  embedded  implication  and  embedded 
universal  quantification,  such  as  A  Prolog. 

9A  thorough  soundness  argument  would  also  require  that  we  establish  the  valiaiiy  of  A-term  substitrtion 
and  /^reconversion.  Instead,  for  the  purposes  of  this  discussion,  we  take  as  given  that  the  underlying  simply- 
typed  A-calculus  operations  preserve  the  soundness  of  b. 
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OV  b  Vx.  (OG)x  for  new  variable  x, 

Hence  6V  b  Gx. 

Given  T7  l-0\„  Bx.  Gx. 

By  definition,  V  bg  Gy  where  y  &  fr ee(OG)  and  y  g  free(6V). 

From  the  ind.hyp.,  6V  I-  6Gy. 

From  the  restrictions  on  y ,  (6\y)V  1“  (( 0\y)G)(0y ). 

By  existential  generalization  (again  applicable  due  to  restrictions), 

(9\y)V  b  Bx.  ((0\y)G)x  for  new  variable  x. 

Hence  (6\y)V  b  (0\y)3x.  Gx. 

Given  Vbg  D  =►  G. 

By  definition,  {£)}  U  V  bg  G. 

From  the  ind.hyp .,  {6D}  U  OV  I-  6G. 

By  implication  introduction,  OV  b  6D  ^  0G. 

Hence  OV  b  0(D  =►  G). 

Given  V  I -4,0  A. 

By  definition,  there  exists  D  €  nf('P)  such  that 

D  =  lyX.  Ax  <=  Gx),  OoxAx  —  OA,  and  OV  b^,  OaxGx, 
dom(<r*)  C  X ,  and  dom(0)  (1  X  =  d\. 

From  the  ind.hyp.,  rfrOV  b  ifrOoxGx-  (1) 

Since  D  €  nf("P),  it  follows  that  V  b  (VX.  Ax  <=  Gx), 
which  in  turn  has  instance  rpOV  b  iJ>0(VX.  Ax  <=  Gx)- 
(This  step  relies  upon  the  validity  of  our  normal  form  mapping; 
i.e.,  that  for  all  D'  €  nf('P),  it  is  necessarily  the  case  that  V  b  D' .) 

By  universal  instantiation  (via  ax),  it  follows  that  ipOV  b  xpOax(Ax  <=  Gx), 
and  then  by  distributivity  of  substitution,  xpOV  b  ipOaxAx  <=■  'd’OaxGx-  (2) 

By  modus  ponens  over  (1)  and  (2),  rpOV  b  xl>0oxAx- 
From  the  definition,  OaxAx  =  OA. 

Hence  xfrOV  b  ipOA. 

3.7.2  Completeness 

The  dual  theorem  of  soundness,  completeness,  typically  fails  for  logic  programs:  even  if 
6V  b  OG,  the  interpretation  may  fail  to  terminate  in  attempting  to  solve  G.  The  source  of 
this  incompleteness  is  (1)  the  depth-first  search  of  the  logic  programming  paradigm,  which 
may  lead  to  infinite  recursion  in  the  solution  of  an  otherwise  valid  goal  (by  selecting  the 
‘wrong’  branch  of  a  disjunction  or  the  ‘wrong’  clause  from  the  program),  and  (2)  AProlog’s 
higher-order  unification.  One  may,  instead,  speak  of  nondeterministic  completeness  —  that 
is,  given  nondeterminism  for  logical  and  unification  choice  points,  the  remainder  of  the  task 
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is  complete.  In  fact,  the  inference  rules  of  Figure  3.4  are  nondeterministically  complete  as 
they  make  no  commitment  to  order  of  evaluation.  As  this  result  is  not  particularly  relevant 
to  this  thesis,  we  refer  the  interested  reader  to  a  similar  discussion  in  Miller  et  al.  [86]. 
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Chapter  4 


Explanation-Based  Generalization 


4.1  Generalization 

Generalizations  may  be  strictly  syntactic,  such  as  the  replacing  of  a  subexpression  with  a 
variable,  or  generalizations  may  be  arbitrarily  ‘deep’  semantically,  such  as  making  a  rule 
applicable  to  a  new  set  of  situations.  The  space  of  possible  generalizations  is  limited  only 
by  the  expressiveness  of  the  language  in  which  results  are  phrased.  For  our  purposes,  the 
language  of  expressions  and  that  of  generalized  expressions  are  one  and  the  same  —  AProlog. 

Inductive  vs.  analytical  generalization.  One  class  of  approaches  to  the  generalization 
problem  is  characterized  as  inductive  or  similarity-based  [1,  30],  Similarity-based  methods 
examine  a  set  of  instances  of  the  desired  concept.  Typically,  syntactic  operations  are  then 
employed  to  derive  a  generalization  covering  those  instances:  e.g.,  a  pattern  that  matches 
each  of  them.  Moreover,  similarity-based  approaches  often  make  use  of  negative  examples 
( i.e.j  examples  that  are  not  instances  of  the  desired  concept)  to  keep  the  result  from  becoming 
over-general.  We  further  discuss  the  inductive  paradigm  within  §4.8. 

The  alternative  approach  to  generalization  is  analytical.  Analytical  methods  determine  what 
and  how  to  generalize  by  employing  a  theory  of  the  problem  domain.  As  a  result,  analytical 
methods  generalize  from  a  single  example,  rather  than  from  the  multiple  instances  required 
by  inductive  paradigms.  To  date,  work  on  analytical  techniques  relies  predominantly  upon 
the  mechanism  of  explanation-based  generalization  (EBG)  and  its  variants  [95,  25,  92,  40]. 
EBG  abstracts  a  particular  problem  solution  (i.e.,  a  proof  or  explanation),  yielding  an  en¬ 
capsulation  of  that  solution  —  that  is,  a  derived  rule  that  more  efficiently  solves  the  original 
as  well  as  related  problems.  While  the  proof-based  generalizations  of  EBG  are  necessarily 
valid  (with  respect  to  the  domain  theory),  similarity-based  generalizations  are  guaranteed 
only  to  the  extent  that  they  cover  the  given  examples. 

Because  the  generalization  space  tends  to  be  very  large  for  any  sufficiently  rich  language, 
similarity-based  methods  frequently  employ  a  bias  to  determine  how  to  generalize.  (The 
simplest  bias  is  to  restrict  the  language  in  which  results  may  be  expressed.)  One  view  of  this 
bias  is  that  it  provides  an  analytical  component  to  an  otherwise  similarity-based  technique. 
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When  this  bias  is  so  strong  that  a  single  example  is  sufficient  for  generalization,  the  method 
becomes  analytical.  Of  particular  interest  currently  is  the  combination  of  inductive  and 
analytical  methods  [65,  114]. 1 


Logic  programming.  Recently,  logic  programming  has  been  used  as  a  foundation  for 
EBG  [71,  108,  64,  8].  When  explanation-based  generalization  is  realized  within  the  logic 
programming  paradigm,  the  derivation  of  a  goal  produces  as  a  byproduct  the  most  gen¬ 
eral  goal  and  the  sufficient  conditions  for  which  the  same  sequence  of  proof  steps  (or  logic 
programming  search  path)  would  apply.  One  argument  put  forward  in  favor  of  the  logic 
programming  framework  is  that  it  admits  a  common  representation  for  all  aspects  of  EBG: 
domain  theory,  training  instance,  query,  derived  rule,  operationality  criteria,  etc.  (These 
concepts  are  defined  later  in  this  chapter.)  This  helps  in  explicating  the  underlying  princi¬ 
ples  in  a  uniform  way  and  clarifies  semantic  issues. 

One  of  the  primary  contributions  of  this  thesis  is  the  development  of  higher-order  EBG, 
which  is  realized  within  the  framework  of  AProlog.  Our  formulation  of  EBG  is  also  unique 
in  that  it  employs  the  modal  logic  operator  □  to  express  the  bias  upon  which  EBG  is  founded. 

4.2  First-order  EBG 

This  section  introduces  first-order  EBG  within  the  logic  programming  framework.  As  it  also 
introduces  concepts  unique  to  our  formulation  of  EBG,  it  should  be  worthwhile  even  for 
readers  familiar  with  the  topic. 

We  begin  by  briefly  illustrating  explanation-based  generalization  with  a  first-order  example 
from  DeJong  Sz  Mooney  [25,  pp.  158-166]:  (We  apologize  to  any  readers  offended  by  the 
morbidity  of  this  example,  but  it  has  become  standard  in  the  literature.)  EBG  divides  the 
theory  of  the  problem  domain  between  a  domain  theory ,  which  we  also  denote  with  V: 

kill  a  b  <=  hate  a  b,  possess  a  c,  weapon  c. 
hate  w  w  <=  depressed  tv. 
possess  tin  •$=  buy  u  v. 
weapon  z  <=  gun  z. 

and  a  training  theory  or  T: 

depressed  john. 
buy  john  objl. 
gun  objl. 

Both  V  and  T  are  composed  of  AProlog  clauses.  For  readers  familiar  with  EBG,  T  roughly 
corresponds  to  training  instance.  Justification  for  the  new  terminology  is  given  within  §4.3. 

1  Hirsh,  for  instance,  couples  explanation-based  and  similarity-based  methods  by  using  version  spaces  (an 
inductive  technique  developed  by  Mitchell  [93])  to  generalize  the  results  of  EBG  [65]. 
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hate  john  john 

hate  tv  tv  <=  depressed  tv. 
(tv  =  john) 


depressed  john 
depressed  john. 


kill  john  john 

kill  a  b  <=  hate  a  b, 

possess  a  c,  weapon  c. 

(a  —  john,  b  =  john) 

possess  john  c 

possess  u  <=  buy  u  v. 

(u  =  john,  t)  =  e) 

buy  john  c 

buy  john  objl. 

(c  =  objl) 


weapon  c 

weapon  z  <=  gun  z. 

(z  =  c) 

gun  c 

gun  objl. 

{ c  =  objl) 


Figure  4.1:  First-order  proof. 


hate  x  y 

hate  w  tv  <=  depressed  tv. 
(w  =  x  =  y) 


depressed  x 


kill  x  y 

kill  a  b  <s  hate  a  b, 

possess  a  c,  weapon  c. 

(a  =  x,b  =  y) 

possess  x  c 

possess  u  tf  <=  buy  u  v. 
(u=x,v  =  c) 

buy  x  c 


weapon  c 

weapon  z  <=  gun  z. 
(z  =  c) 

gun  c 


Figure  4.2:  First-order  generalized  proof. 


The  EBG  algorithm  is  additionally  provided  with  a  query,  or  goal,  such  as 
?  —  kill  john  john. 

EBG  then  requires  a  proof  that  solves  the  given  query.  Within  the  logic  programming 
paradigm,  such  a  proof  may  be  expressed  as  a  trace  of  AProlog  computation.  A  proof  of  the 
above  query  is  illustrated  within  Figure  4.1.  Goals  of  the  proof  are  underlined,  while  the 
program  clause  that  reduces  a  particular  goal  appears  underneath.  In  the  course  of  applying 
each  clause,  its  variables  may  be  unified  with  constants  or  variables  of  the  goal,  resulting  in 
the  given  unification  constraints  (enclosed  in  ‘()’). 

EBG  generalizes  this  explanation  to  produce  an  encapsulation  of  the  employed  proof  strategy. 
In  Figure  4.2,  a  generalized  proof  is  constructed  that  corresponds  to  the  original,  except 
that  clauses  of  T  (or  T-clauses)  are  omitted.  This  forms  EBG’s  bias  in  the  generalization 


40 


space:  the  proof  of  the  given  query  is  generalized  by  abstracting  steps  involving  clauses  of 
the  training  theory.  At  the  root  of  the  new  proof  is  a  generalized  query,  which  may  be 
derived  from  the  original  by  replacing  each  of  the  first-order  constants  with  a  variable:  the 
goal  kill  john  john  becomes  the  general  goal  kill  x  y.  Clauses  of  V  (or  D-clauses)  applied 
in  the  first  proof  axe  correspondingly  applied  in  the  second.  This  restricts  the  outcome 
by  propagating  unification  constraints  through  the  proof  (e.g.,  kill  x  y  becoming  kill  x  x). 
Leaves  of  the  generalized  proof  ( e.g .,  gun  c)  correspond  to  subgoals  of  the  original  proof  that 
were  derived  from  T.  These  leaves  are  accumulated  in  a  conjunction  of  conditions  sufficient 
to  establish  the  generalized  query: 

kill  x  x  <=  depressed  x,  buy  x  c,  gun  c. 

We  will  frequently  refer  to  the  resulting  proof  encapsulation,  as  a  derived  rule,  or  as  an 
explanation-based  generalization ,  or  simply  as  a  generalization. 


4.3  Modal  Logic 

Our  formulation  of  EBG  depends  upon  the  separation  of  2>  and  T,  since  only  rules  of 
the  former  are  incorporated  within  generalized  proofs.  To  differentiate  the  two,  we  prefix 
■p-clauses  with  the  □  operator,  which  is  borrowed  from  modal  logic  —  logics  in  which 
propositions  have  multiple  levels  or  modes  of  truth,  such  as  ‘may  be’  and  ‘must  be.’  2 

We  illustrate  our  use  of  □  on  the  first-order  example  of  §4.2.  T>  and  T,  which  constitute  the 
logic  program,  may  now  be  jointly  expressed  as 

□  Va  Vb  Vc.  kill  a  b  <=  hate  a  b,  possess  a  c,  weapon  c. 

□  Vtu.  hate  w  w  <=  depressed  w. 

□  Vu  Vv.  possess  u  v  <=  buy  u  v. 

□  Vx.  weapon  z  <=  gun  z. 

depressed  john. 

buy  john  objl. 
gun  objl. 

The  above  presentation  does  not  rely  upon  AProlog’s  implicit  universal  quantification  of  a 
program’s  logical  variables.  This  is  because  our  EBG  algorithm  differentiates  between  the 
clauses  □  Vx.  D  and  Vx.  □  D.  (The  motivation  for  this  distinction  may  be  found  in  §8.9.) 
However,  since  explicitly  specifying  quantifiers  can  become  exceedingly  tedious,  we  introduce 
the  ‘! !’  shorthand  to  represent  this  universal  quantification  implicitly.  The  first  clause  of  V 
may  then  be  expressed  as 

! !  kill  ab  •£=  hate  a  b,  possess  a  c,  weapon  c. 

And,  for  the  query  kill  john  john,  the  resulting  explanation-based  generalization  becomes 

! !  kill  x  x  <=  depressed  x,  buy  x  c,  gun  c. 


2For  an  introduction  to  modal  logic,  see  Chellas  [16]. 
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hate  x  y 

! !  hate  w  w  <=  depressed  w. 
(w  =  x  =  y) 

depressed  x 


kill  x  y 

! !  kill  a  b  <=  hate  a  b, 

possess  a  c,  weapon  c. 

(a  =  x,b  =  y) 
possess  x  c 

! !  possess  tni<=  buy  u  v. 
(u  =  x,v  =  c) 

buy  x  c 


weapon  c 


Figure  4.3:  Less  specific  generalized  proof. 


Traditionally,  the  modal  operator  □  (sometimes  called  ‘L’)  precedes  necessarily  true  sen¬ 
tences,  or  equivalently,  those  true  in  ‘all  possible  states’  or  at  ‘all  times.’  Non-prefixed 
sentences  are  only  contingently  true,  true  in  the  ‘current  state’  or  at  the  ‘current  time.’ 
Our  incorporation  of  □  is  founded  upon  a  correspondence  between  (1)  EBG’s  separation  of 
domain  and  training  theory  and  (2)  modal  logic’s  separation  of  necessary  and  contingent 
truth:  Because  the  validity  of  the  generalizations  derived  through  EBG  depend  solely  upon 
T>,  more  stringent  truth  requirements  are  placed  upon  D- clauses  —  namely  that  they  be  true 
in  all  possible  configurations  of  the  problem  space  being  modeled.  Clauses  of  T,  as  they 
are  excluded  from  generalized  proofs,  can  safely  be  revised  or  removed  without  invalidating 
the  derived  generalizations  (e.g.,  depressed  john  becoming  false).  Such  revision  could  be 
explained  semantically  as  ‘changing  states’  or  ‘switching  worlds.’3 

Suppose  that  within  the  suicide  example,  we  remove  the  □  from  the  clause  weapon  z  <=  gun  z 
This  results  in  the  generalized  proof  of  Figure  4.3,  and  the  generalization 

! !  kill  x  x  <=  depressed  x,  buy  x  c,  weapon  c. 

The  above  rule  is  more  general  than  the  previous  one,  but  its  application  requires  more 
computation.  This  illustrates  the  trade-off  inherent  in  the  partitioning  of  T>  and  T :  P-clauses 
get  ‘compiled  into’  the  rules  derived  through  EBG,  while  T -clauses  must  be  evaluated  at 
‘runtime’  (the  time  of  application). 

Now  suppose  instead  that  we  replace  the  last  clause  with  □  gun  objl,  again  within  the 
original  example.  This  has  the  effect  of  ‘anchoring’  the  generalization  to  objl,  with  the 
result  of  an  identical  query  being  the  generalized  proof  of  Figure  4.4  (whose  rightmost  branch 
is  solved),  and  the  generalization 

! !  kill  x  x  •$=  depressed  z,  buy  x  objl. 


3Readers  familiar  with  EBG  may  wonder  how  the  concept  of  operationality  relates  to  V  and  T.  We  defer 
this  discussion  until  §4.5. 
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kill  x  y 

! !  kill  a  b  <=  hate  a  b, 

possess  a  c,  weapon  c. 

(a  =  x,b  =  y) 

hate  x  y  possess  x  c 

! !  hate  w  w  ! !  possess  tni<=  buy  u  v. 

<=  depressed  to. 

{w  —  x  =  y)  {u  =  x,  v  =  c) 

depressed  z  buy  a;  c 


weapon  c 

! !  weapon  z  <=  gun  j. 

{z  =  c) 

gun  c 

! !  gun  objl 
{c  =  objl) 


Figure  4.4:  More  specific  generalized  proof. 


By  moving  a  clause  from  T  to  ©,  we  make  the  resulting  generalization  more  specific.  Such 
a  shift  is,  however,  dangerous  in  that  the  generalization  then  depends  upon  the  validity  of 
gun  objl.  In  another  configuration  where  objl  is  not  a  gun,  the  derived  rule  is  false! 

□  may  also  be  used  in  the  query  language  to  determine  the  ‘necessary’  truth  of  a  goal  G, 
but  this  is  less  likely  to  yield  interesting  generalizations  as  the  proof  of  □  G  is  composed 
solely  of  ©-clauses  (and  is  therefore  isomorphic  to  the  generalized  proof).  Nevertheless,  the 
derived  rule  may  be  a  generalization  in  that  constants  of  G  are  abstracted. 

Training  instance.  Previous  realizations  of  EBG  have  used  the  term  ‘training  instance’ 
rather  than  our  ‘training  theory’  T.  While  the  literature  makes  the  same  operational  distinc¬ 
tion  of  excluding  training  instance  from  generalized  proofs,  the  term  additionally  carries  the 
connotation  of  embodying  a  single  example  situation  from  which  the  learner  should  general¬ 
ize.  We  have  taken  the  liberty  of  renaming  the  training  instance  to  avoid  that  connotation. 

Typically  within  logic  programming  implementations  of  EBG,  atomic  clauses  are  directly  rec¬ 
ognized  as  belonging  to  the  training  instance  [71,  64,  108]  —  e.g.,  gun  objl.  Although  this 
notion  of  training  instance  offers  some  intuitive  value,  we  find  it  artificially  restrictive.  There 
exist  atomic  clauses  that  we  might  want  to  include  within  V:  ! !  adjacent  z  z.4  The  same  is 
true  even  for  constant  atomic  clauses:  for  example,  to  represent  that  block  1  is  glued  to  the 
table  we  could  assert  □  on  blockl  table.  Alternatively,  we  might  want  to  include  variables 
and  logical  connectives  within  T-clauses:  for  example,  under  the  temporary  condition  that  all 
blocks  are  stacked  in  two-high  pairs,  we  might  assert  VzVy.  on  x  y  =t>  (clear  z,  on  y  table). 

□  furthermore  affords  the  potential  to  intermix  knowledge  of  the  domain  and  training  theory 
through  the  nesting  of  □  below  the  top-level  of  clauses. 


4To  accomplish  this,  some  EBG  systems  employ  the  trick  of  writing  the  clause  as  adjacent  x  x  <=  true. 
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Our  use  of  □,  then,  avoids  what  we  believe  to  be  undue  limitations  on  the  training  instance: 
our  training  theory  may  instead  contain  arbitrary  AProlog  clauses.  □  provides  an  underlying 
limit  to  the  generalization  by  allowing  overly  specific  knowledge  to  be  excluded  from  the 
derived  rules.  Within  a  nonmonotonic  logic,  for  example,  such  a  mechanism  could  guarantee 
validity  for  the  resulting  generalizations  by  distinguishing  the  fixed  theory  from  temporal 
assertions.  In  addition  to  providing  greater  expressiveness,  a  modal  logic  representation  for 
the  distinction  between  T>  and  T  can  be  given  a  clear  semantics  that  is  independent  of  a 
particular  search  procedure  or  generalization  algorithm. 

Modal  logic  and  EBG.  Admittedly,  the  analogy  that  contingency  is  to  necessity  as 
training  theory  is  to  domain  theory  is  philosophically  questionable.  The  basis  for  our  incor¬ 
poration  of  □  is  rather  that  the  operator  elegantly  models  the  difference  between  T  and  V 
in  a  formal  (as  opposed  to  an  operational)  manner  —  that  is,  through  a  formal  language 
and  the  accompanying  proof  system.  Our  use  of  the  terms  ‘contingency’  and  ‘necessity’  is 
meant  to  convey  some  semantic  intuition  about  why  □  models  this  distinction.  One  could 
easily  turn  this  observation  around  and  say  that  we  have  found  yet  another  interpretation 
of  □.* 


The  inclusion  of  □  within  AProlog  leads  to  a  rich  language  for  higher-order  EBG  —  ADProlog. 
The  remainder  of  this  chapter  continues  the  discussion  of  modal  logic  and  higher-order  EBG; 
within  Chapters  5  &  8  we  further  develop  and  formalize  ADProlog. 


4.4  Higher-order  EBG 

In  Chapter  3  we  made  the  case  for  the  additional  expressiveness  afforded  by  higher-order 
language,  and  in  particular  for  AProlog.  Expressive  elegance  is  intimately  tied  to  effective 
generalization:  If  knowledge  is  represented  in  an  inappropriate  language,  then  it  is  less  likely 
that  the  desired  generalizations  can  be  expressed  in  a  natural  and  concise  manner,  and  also 
less  likely  that  they  can  even  be  found.  In  particular,  the  cumbersome  encoding  of  higher- 
order  domains  within  first-order  languages  inhibits  reasoning  and  generalization.  But  to 
date,  the  application  of  EBG  has  been  limited  to  first-order  languages.  To  facilitate  EBG’s 
application  to  higher-order  domains,  we  extend  the  technique  to  higher-order  explanation- 
based  generalization  —  that  is,  EBG  in  which  functions  and  predicates  as  well  as  first-order 
constants  may  be  abstracted,  or  replaced  with  variables. 

We  would  like  to  assert  more  —  namely  that  first-order  encodings  are  inadequate  for  the 
task  of  generalization  over  higher-order  domains,  in  particular  because  primitive  syntactic 
manipulations  inevitably  intrude  into  the  generalizations.  This  is,  however,  simply  one 
aspect  of  the  open  argument  between  first-  and  higher-order  languages  (§3.3). 


5There  are  already  many  such  interpretations:  □  can  stand  for  ‘formally  provable’,  or  for  truth  in  all 
reachable  worlds  in  a  Kripke  semantics  [69],  etc. 
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We  introduce  higher-order  EBG  with  another  example  frequently  exploited  within  the  lit¬ 
erature,  that  of  symbolic  integration.  Calculus  integration  is  fundamentally  a  higher-order 
domain  in  that  the  items  being  manipulated  are  functions,  and  functions  and  variables  that 
range  over  functions  are  not  naturally  part  of  a  first-order  language.  Consider  the  follow¬ 
ing  higher-order  rules  for  integration:  the  first  treats  exponentiation,  the  second  extracts  a 
constant  factor,  and  the  third  splits  a  sum.  The  predicate  intgr  relates  a  function  to  its 
indefinite  integral.  To  increase  readability,  we  use  a  mathematical  notation  for  arithmetic 
operators  not  included  in  AProlog  —  in  particular,  exponentiation  and  division. 

!!  intgr  (Ax.x°)  (\x.xa+1/(a  +  1)). 

! !  intgr  (A x.a  *  fx)  (A x.a  *  hx)  *$=  intgr  /  h. 

!!  intgr  (Xx.fx  +  hx)  (A x.f'x  +  h'x)  <=  intgr  /  /',  intgr  h  h'. 

intgr  cos  sin . 

The  traditional  binding  notation  of  dx  has  been  replaced  with  A-terms.  Missing  from  the 
first  rule  is  the  restriction  that  a  ^  —  1,  because  AProlog  does  not  admit  constraints  other 
than  those  imposed  by  unification.6  Readers  may  find  the  last  rule  more  intelligible  in  its 
77-expanded  form  intgr  (Aar.  cos  x)  (Xx.  sin  x).  The  cosine  rule  is  an  example  a  T-clause 
that  is  not  ‘contingent’:  while  the  rule  is  just  as  valid  as  the  others,  it  represents  a  proof 
step  we  wish  to  abstract  under  EBG. 

The  query 

?  -  intgr  (Ai.3  *  x2  4-  cosx)  h. 
yields  the  solution 

h  =  Ax.3  *  x2+1/(2  +  1)  -(-  sin  x 
and  the  generalization 

!!  intgr  (Ax.a*x4  +  /x)  (Ax.a  *  x4+1/(^  +  1)  +  f'x)  ^  intgr  /  /'. 

The  proof  and  generalized  proof  associated  with  this  example  are  given  in  Figures  4.5  and 
4.6,  respectively. 

The  generalization  space  of  higher-order  EBG  is  significantly  larger  than  that  of  first-order, 
in  that  higher-order  constants  are  additionally  subject  to  variable  replacement:  consider  that 
in  the  first-order  case  of  Figure  4.2,  the  goal  kill  x  y  is  fully  general,  while  for  higher-order, 
a  single  variable  G  ranging  over  goals  is  fully  general.  Also  unlike  the  proofs  of  §4.2  k  4.3, 
the  integration  proofs  make  use  of  higher-order  unification,  which  implicitly  enforces  the 
restrictions  placed  upon  free  and  bound  variables:  for  example,  within  an  application  of  the 
power  rule,  Ax.x°  will  not  unify  with  Ax.xx  since  a  may  not  contain  free  occurrences  of  x. 
What  is  more,  function  variables  may  in  this  way  appear  in  the  derived  generalizations  ( e.g ., 
/)• 

6For  a  discussion  of  logic  programming  with  constraints,  see  Jaffar  &  Lassez  [70]. 
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intgr  (Ax.3  *  x2  +  cosx)  r 

! !  intgr  (A x.gx  +  fx)  (A x.g'x  +  f'x) 

«=  intgr  g  g' ,  intgr  /  /'. 

{g  =  Ax.3  *  x2,  f  =  cos,  r  =  Xx.g'x  +  f'x) 

intgr  (Ax.3  *  x2)  g'  intgr  cos  f 

! !  intgr  (A  x.a  +  hx)  (A  x.a*  h'x)  <=  intgr  h  h'.  intgr  cos  sin. 

(a  =  3,  h  =  A x.x*,g'  =  A x.a  *  h'x)  (f  =  sin) 

intgr  (Ax.x2)  b! 

!!  intgr  (Ax.x*)  (Ax.x*+ l/(b  +  1)). 

(6  =  2,  /»'  =  Xx.xb+1/(b  +  1)) 


Figure  4.5:  Higher-order  proof. 


a 

! !  intgr  (Ax .gx  +  fx)  (A x.g'x  +  f'x) 

<=  intgr  g  g',  intgr  /  /'. 

{ G  =  intgr  (A x.gx  +  fx)  ( Xx.g'x  +  fx)) 

intgr  g  gf  intgr  /  f 

! !  intgr  (Ax.a  *  hx)  (A x.a  *  h'x)  <=  intgr  h  h’. 

( g  =  Ax.a  *  hx,g'  =  Ax.a  *  h'x) 

intgr  h  h' 

! !  intgr  (Ax.x*)  (Ax.x*+1/(&  +  1)). 

{h  =  Ax.x*,  h'  =  A  x.x*+1/(6  +  1)) 

Figure  4.6:  Higher-order  generalized  proof. 
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Related  work.  Donat  &  Wallen  also  address  the  step  from  first-  to  higher-order  EBG 
over  the  domain  of  symbolic  integration  [37].  Higher-order  generalization  allows  extremely 
general  rules  to  be  extracted  from  particular  problem  solutions.  Our  work  focuses  on  how  to 
control  EBG  to  avoid  over-general  generalizations,  and  yet  at  the  same  time  take  advantage 
of  the  higher-order  nature  of  the  language.  Donat  &  Wallen’s  work  concentrates  on  how 
one  could  still  usefully  apply  very  general  learned  rules.  To  that  end  they  introduce  some 
control  constructs  into  the  higher-order  unification  process.7  In  that  sense  our  approaches 
differ  fundamentally. 

Donat  and  Wallen’s  approach  also  utilizes  a  first-order  representation  of  integrals  (from 
which  they  then  produce  higher-order  generalizations).  This  first-order  encoding  requires 
additional  constraints,  manifest  in  the  constant  primitive,  which  pervade  their  derived 
rules  and  which  are  avoided  under  our  higher-order  encoding. 


4.5  Operationality 

We  illustrated  in  §4.3  how  □  defines  which  proof  steps  are  included  in  generalized  proofs. 
Within  the  EBG  paradigm,  the  traditional  means  of  restricting  the  extent  of  generalized 
proofs  is  through  operationality  criteria:  By  establishing  that  a  particular  goal  meets  an 
operationality  criterion,  the  subtree  deriving  it  is  ‘pruned’  from  the  generalized  proof.  That 
is,  an  operationality  criterion  can  be  viewed  as  a  predicate  that  determines  whether  a  given 
goal  should  be  a  leaf  of  the  generalized  proof.  The  term  ‘operational’  arises  from  the  require¬ 
ment  that  such  subgoals  be  easily  derivable,  since  operational  subgoals  must  be  established 
(solved)  in  the  course  of  applying  an  explanation-based  generalization. 

To  illustrate,  if  we  augment  the  original  formulation  of  the  suicide  example  (§4.3)  with  a 
declaration  that  the  goal  weapon  z  is  operational,  the  EBG  algorithm  produces  the  derived 
rule 


! !  kill  x  x  •£=  depressed  z,  buy  x  c,  weapon  c. 

This  follows  because  the  computation  below  the  operational  goal  weapon  z  is  herein  ex¬ 
cluded  from  the  generalized  proof.  Thus,  while  □  establishes  which  branches  of  the  proof 
tree  will  lead  to  antecedents  in  the  derived  rule  (because  they  are  established  by  contingent 
clauses),  operationality  criteria  determine  to  what  depth  to  which  the  proof  tree  extends 
within  a  branch.8 

Although  □  and  operationality  criteria  are  both  mechanisms  that  limit  the  extent  of  gener¬ 
alized  proofs,  the  former  is  a  property  of  clauses  (i.e.,  whether  or  not  they  contain  □),  while 
the  latter  is  a  property  of  goals  (i.e.,  whether  or  not  they  are  operational). 

7In  particular,  they  permit  filter  expressions  to  be  defined  to  preclude  trivial  higher-order  unifications; 
for  example,  in  matching  /(a)  =  a  +  b,  the  instantiation  of  /  =  Ax.a  +  b  might  be  less  desirable  than 
/  =  Xx.x  +  b. 

8Keller  reviews  existing  formulations  of  operationality,  and  develops  the  topic  significantly  beyond  its 
treatment  herein  [72]. 
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Operationality  criteria  present  the  same  trade-off  we  have  seen  for  □:  the  closer  the  oper¬ 
ational  subgoals  axe  to  the  root  of  the  generalized  proof,  the  more  generally  applicable  the 
derived  rule  is,  but  also  the  more  work  is  required  to  apply  it.  In  fact,  one  reason  that  the 
partition  between  domain  and  training  theory  has  not  received  more  consideration  within 
the  literature  is  that  the  mechanism  of  operationality  typically  precludes  any  T -clauses  from 
entering  into  generalized  proofs. 

Operationality  criteria,  however,  do  provide  features  beyond  the  capabilities  of  □.  For 
instance,  they  offer  a  generally  more  concise  means  to  define  generalized  proofs:  by  declaring 
only  a  single  subgoal  to  be  operational,  the  entire  branch  of  the  generalized  proof  underneath 
is  excluded,  or  pruned,  from  the  generalization.  Achieving  the  same  effect  with  □  alone 
would  require  removing  from  each  of  the  program  clauses  applied  within  that  branch. 
Moreover,  if  a  particular  rule  is  used  pervasively  in  a  proof,  it  might  be  necessary  to  include 
it  within  both  D  and  T  (and  then  use  some  form  of  additional  control  to  discriminate  between 
occurrences.)  Operationality  criteria  do  not  present  a  corresponding  problem,  as  it  is  unlikely 
that  recurring  subgoals  should  be  considered  both  operational  and  non-operational. 

□  does,  on  the  other  hand,  offer  a  means  by  which  to  generalize  in  a  entirely  different 
manner:  consider  that  even  interior  steps  can  be  abstracted  from  generalized  proofs  via  □. 
We  illustrate  this  with  one  last  contrived  perturbation  of  the  suicide  example: 

! !  kill  ab  <=  hate  a  6,  possess  a  c,  weapon  c. 

! !  hate  w  w  <=  depressed  w. 

! !  possess  uv  <=  buy  u  v. 

weapon  z  <=  gun  z. 

! !  gun  z  <=  pistol  z. 

depressed  john. 
buy  john  objl. 
pistol  objl. 

For  the  standard  query  kill  john  john,  the  above  theory  leads  to  the  generalized  proof  in 
Figure  4.7,  and  the  generalization 

! !  kill  x  x  <=  depressed  x,  buy  x  c,  pistol  c,  (gun  c  ^  weapon  c). 

Note  that  this  rule  allows  the  use  of  an  arbitrarily  extended  computation  to  establish  the 
subgoal  gun  c  =>  weapon  c. 

We  conclude  that  the  mechanisms  of  operationality  criteria  and  □  are  complementary,  and 
while  □  is  sufficient  to  formulate  the  examples  presented  within  this  dissertation,  we  do 
not  suggest  it  as  a  replacement  for  operationality  criteria.  In  fact,  the  combination  of  the 
two  is  particularly  attractive:  modal  logic  induces  an  underlying  limit  to  the  specialization 
of  derived  rules  that  potentially  prohibits  EBG  from  yielding  ‘incorrect’  generalizations. 
Operationality  criteria,  in  turn,  provide  a  means  to  ‘fine  tune’  selection  from  the  space  of 
possible  generalizations  admitted  by  □.  This  is  particularly  true  of  dynamic  operationality 
criteria  —  i.e.,  those  which  allow  the  operationality  of  goals  to  be  defined  and  redefined 
within  the  computational  framework  [64].  Since  dynamic  operationality  criteria  are  subject 


! !  kill  a  b  hate  a  b, 

possess  a  c,  weapon  c. 

(a  =  x,b  =  y) 

hate  x  y  possess  x  c  weapon  c 

! !  hate  w  w  ! !  possess  u  v  <=  buy  u  v. 

<=  depressed  w. 

(w  =  x  =  y)  (u  =  x,  v  =  c) 

depressed  x  buy  x  c  gun  c 

! !  gun  z  <=  pistol  z 
(z  =  c) 

pistol  c 

Figure  4.7:  Generalized  proof  with  internal  step  abstracted. 


to  change,  derived  rules  typically  incorporate  some  representation  of  the  utilized  criteria  to 
insure  their  continued  integrity.  Through  the  use  of  □,  the  need  for  these  operationality 
preconditions  is  reduced  or  eliminated  because  of  the  guarantee  of  validity  afforded  by  □. 

We  further  compare  and  contrast  operationality  and  □  in  §6.4. 


4.6  Partial  Evaluation  vs.  EBG 

As  pointed  out  by  Van  Harmelen  &  Bundy  [129],  explanation-based  generalization  is  closely 
related  to  partial  evaluation  (PE).  On  the  surface  the  two  seem  to  be  very  different  paradigms: 
consider  that  EBG  is  a  process  of  generalization,  while  PE  is  one  of  specialization.  These 
opposing  definitions  may  be  reconciled  by  considering  two  different  views  of  EBG:  The  first, 
and  the  one  thus  far  articulated,  is  that  EBG  involves  the  generalization  of  a  particular 
solved  problem  (by  abstracting  those  solution  steps  based  upon  T-clauses  or  those  solving 
operational  goals).  However,  one  may  alternatively  view  EBG  as  as  a  specialization  (or  par¬ 
tial  evaluation)  of  T>  to  the  logic  programming  computation  solving  the  given  goal.  In  fact, 
each  of  the  explanation-based  generalizations  we  produce  could  equally  be  derived  through 
partial  evaluation. 

In  spite  of  this  correspondence,  the  mechanisms  by  which  PE  and  EBG  produce  results  differ 
in  fundamentally  important  ways: 


•  EBG  relies  upon  an  example  explanation  —  i.e.,  a  goal  and  the  accompanying  solution, 
while  PE  works  from  an  unsolved  general  goal. 


•  EBG  employs  a  bias  in  the  logic  program  {T>  vs.  T  and/or  operationally  criteria)  to 
determine  the  form  of  generalized  proof,  while  PE  utilizes  some  form  of  search  control 
to  explore  possible  expansions  of  a  general  goal. 

In  short,  EBG  is  a  specialized  application  of  partial  evaluation  in  which  search  control  is 
provided  by  a  particular  example,  the  partition  between  T>  and  T,  and  the  operationality 
criteria. 

4.6.1  Other  Work 

Etzioni’s  thesis  [41]  considers  the  replacement,  within  the  framework  of  Prodigy  [92],  of  EBL 
with  Static,  which  instead  uses  a  formal  analysis  of  the  domain  theory  incorporating  partial 
evaluation  to  derive  new  rules.  For  the  domains  he  considers,  Static  generally  outperforms 
EBL;  that  is,  through  its  additional  analysis,  his  system  is  able  to  derive  better  rules.  We  are 
skeptical,  however,  as  to  whether  this  approach  can  be  effectively  extended  to  intractable 
domains  wherein  the  domain  theory  itself  is  relatively  small  {e.g.,  a  set  of  axioms),  but 
for  which  the  space  of  possible  partial  evaluations  (i.e.,  the  closure  of  that  domain  theory) 
is  infinite.  For  such  theoiies,  some  form  of  search  control,  such  as  is  provided  by  EBG’s 
example,  appears  indispensable. 


4.6.2  Example:  “peval” 

In  this  section  we  present  a  rudimentary  partial  evaluator  for  AProlog  (not  An Prolog)  because 
it  raises  important  issues  in  the  difference  between  EBG  and  PE,  and  also  because  it  will 
be  pertinent  to  later  discussion.  An  unabridged  partial  evaluator,  as  well  as  an  extended 
example  of  partial  evaluation,  is  included  within  Appendix  A. 3. 

Figure  4.8  contains  the  code  for  peval,  where  its  second  argument  is  the  result  of  partial 
evaluating  its  first.  The  consequence  of  peval  E  G  is  that  the  derived  rule  E  <=  G  follows 
from  the  program.  (The  new  class  E  stands  for  AProlog  logical  expressions  that  are  both 
clauses  and  goals  —  i.e.,  E  =  D  H  G.  If  the  potential  result  of  PE,  E  •$=  G,  is  actually  to 
be  added  to  the  program,  then  E  must  additionally  be  a  legal  D-form.) 

Search  control  is  provided  by  the  user,  who  determines  (1)  whether  PE  is  to  continue  at 
each  atomic  goal  Ga  (via  stop?),  (2)  whether  a  particular  applicable  rule  from  the  program- 
base  is  to  be  applied  in  the  solution  of  Ga  (via  apply_rule?),  and  (3)  which  branch  of 
an  alternation  to  partially  evaluate  (via  left?).9  The  predicate  hyp  is  used  to  enumerate 
clauses  of  the  program-base.10  These  clauses  are  assumed  to  be  in  normal-form  Dn f,  which 
we  defined  in  §3.6. 


9Within  APrdog’s  input  predicate  read  A x.Gx,  the  variable  x  is  bound  to  the  entered  term  before 
execution  of  read’s  body  Gx.  read  A q.q,  then,  provides  a  simple  method  for  querying  the  user  for  a  yes/no 
(i.e.,  true/false)  response. 

10The  need  for  hyp  is  further  discussed  in  §8.4.1. 
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peval  true  true  -$=  !. 

peval  (Gi  ,  G2)  (G3  ,  G4)  •<=  !,  peval  G\  G3,  peval  G2  G4. 

peval  (Gi  ;  G2)  G  <=  !,  ((left?  ( G\  ;  G2), !,  peval  G\  G);  peval  G2  G). 

peval  (Vx.Gx)  (Vx.G ix)  4=  !,  Vx.  peval  (Gx)  (Gix) 

peval  (3x.  Gx)  Gi  <=  !,  peval  (Gy)  G 1 

peval  (D  =>  G)  G\  <=  !,  norm-form  D  D\,  hyp  D\  =»  peval  G  G\ 

peval  Ga  Ga  •*=  stop?  Ga. 

peval  Ga  G  ^  hyp  i?,  match-rule  D  (Ga  <=  Gs),  apply  .rule?  Z>,  peval  Gs  G. 

match-rule  D  D\  <=  match_rule_and  Z)  Di 

match-rule-and  (Dx  ,  D2 )  Z)  -4=  !,  (match_rule_and  Z>i  Z);  match.rule_and  Z>2  Z>). 

match-rule-and  D  D\  <=  !,  match_rule_pi  D  D\ 

match-rule-pi  (Vx.  Dx)  D\  <=  !,  match_rule_pi  (Dx)  D\. 

match_rule_pi  D  D. 

left?  G  <=  write  G,  write  -string  “Left  branch?  :  ”,  read  A  q.q. 

stop?  Ga  <=  write  Ga,  write-string  “Stop?  :  ”,  read  Xq.q. 

apply_rule?  D  <=  write  D,  write_string  “Apply?  :  ”,  read  Xq.q. 

Figure  4.8:  Interactive  Particil  Evaluator 

4.7  Chunking  vs.  EBG 

Yet  another  paradigm  has  been  compared  to  EBG,  this  one  coming  out  of  Soar.  “Soar  is 
an  attempt  to  build  a  general  cognitive  architecture  combining  general  learning,  problem 
solving,  and  memory  capabilities”  [112,  p. 561]. 11  Chunking  is  the  learning  mechanism  of 
the  Soar  architecture.  Rosenbloom  and  Laird  present  the  case  that  EBG  is  very  similar  to 
chunking  [112].  This  correspondence  may  be  expressed  as  the  following  mapping  between 
Soar  and  our  logic  programming  formulation  of  EBG: 

EBG _ 

goal 

training  theory 
domain  theory 
derived  rules 

proof  (logic  programming  trace) 
generalized  proofs 
operationality  criteria 

11  For  an  introduction  to  Soar  and  chunking,  see  Laird,  Rosenbloom,  &  Newell  [74,  75]. 


Chunking 

goal 

problem  state 
problem  operators 
chunks  (new  productions) 
problem  solution 

backtraced  &  variablized 
production  sequences 

whether  productions  for 
a  predicate  exist 
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Soar  solves  problems  at  two-levels:  Initially,  learned  rules  of  the  production  system  are 
employed.  When  this  strategy  leads  to  an  impasse  —  t.e.,  no  production  solves  the  problem, 
Soar  reverts  to  the  operators  in  the  problem  space  (theory).  Upon  solving  the  goal  within 
the  ‘ground’  problem  space,  Soar  backtracks  and  ‘variablizes’  (t.e.,  generalizes)  the  problem 
solving  trace.  The  result  is  a  chunk,  or  derived  production  rule,  which,  when  added  to  the 
production  system,  potentially  speeds  future  problem  solving.  Soar’s  subgoals  are  assumed 
to  be  ‘operational’  if  there  are  pre-existing  productions  relevant  to  that  subgoal. 

Hopefully,  this  gross  simplification  of  the  Soar  architecture  has  given  unfamiliar  readers  a 
rudimentary  understanding  of  chunking.  We  include  this  discussion  as  it  will  be  relevant  to 
future  comparisons  (§6.3),  and  further  as  it  raises  one  problem  with  the  application  of  the 
existing  Soar/EBG  correspondence  to  our  formulation  of  EBG  —  namely  that  the  mapping 
of  T  to  a  state  space  is  not  satisfying.  The  problem  is  that  AD  Prolog  provides  a  more 
expressive  training  theory  than  the  training  instance  of  typical  EBG  formulations.  It  is 
unclear  how  ADProlog’s  enhanced  ability  to  abstract  proof  steps  maps  onto  Soar. 


4.8  Inductive  Generalization 

Within  this  section,  we  further  discuss,  for  the  interested  reader,  some  relevant  inductive 
methods  of  generalization.  As  this  discussion  does  not  bear  on  future  chapters,  it  may  be 
skipped  by  those  so  inclined. 

Anti-unification.  Perhaps  the  simplest  method  of  generalizing  an  expression  is  to  abstract 
a  particular  subexpression  with  a  variable.  This  establishes  a  partial  order  of  instance  — 
one  expression  is  an  instance  of  another  if  the  former  may  be  derived  from  the  latter  by 
substituting  terms  for  variables.  The  related  process  of  unification,  as  discussed  in  §3.2, 
determines  whether  there  exists  a  substitution  deriving  a  common  instance  from  two  or 
more  expressions. 

We  may  also  define  the  duals  of  these  notions:  anti-instance  —  one  expression  is  an  anti¬ 
instance  of  another  if  the  former  may  be  derived  from  the  latter  by  abstracting  subexpressions 
with  variables,  and  anti-unification  —  the  process  of  deriving  a  common  anti-instance  of  two 
or  more  expressions.12  In  defining  anti-unification,  we  would  like  to  introduce  the  concept 
of  a  least  general  anti-unifier  (LGAU),  analogous  to  the  most  general  unifier  (MGU)  of 
unification.13  However,  just  as  higher-order  unification  does  not  admit  MGU’s,  higher-order 
anti-unification  does  not  admit  LGAU’s:14  consider  that  the  anti-unification  of  Ax.f(g(x)) 
and  Ax.f(h(x))  yields  Ax.f(F(g(x),  h(x)))  and  Ax.F(f(g(x)),  f(h(x))),  again  neither  of 


12Anti-unification  was  introduced  independently  by  Plotkin  [107]  and  Reynolds  [109].  Dietzen  &  Scherlis 
tentatively  discuss  anti-unification  in  program  development  under  the  name  ‘least  general  generalization’  [33]. 
For  another  treatment  of  first-order  unification  and  anti-unification,  see  Lassez,  Maher  &  Marriott  [76]. 

13Somewhat  more  formally,  the  instance  relation  forms  a  lattice  on  the  expression  language  in  which  MGU 
is  the  meet  operator,  and  LGAU  is  the  join  [67].  (For  an  accessible  introduction  to  lattices,  see  Stoy  [125].) 

MThe  higher-order  case,  then,  does  not  admit  a  lattice  since  meets  and  joins  are  not  unique. 
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which  is  an  instance  of  the  other.  Higher-order  anti-unification  is  further  developed  within 
Pfenning  [105]. 

As  it  employs  multiple  examples,  anti-unification  is  an  inductive  or  similarity-based  method 
of  generalization.  While  anti-unification  itself  only  handles  positive  instances,  the  technique 
may  be  extended  to  incorporate  negative  ones  as  well,  which  leads  to  Mitchell’s  version 
spaces  [93].  Version  spaces  potentially  admit  a  higher-order  treatment  that  is  founded  upon 
higher-order  unification  and  anti-unification.  Indeed,  this  is  an  attractive  area  for  research, 
although  higher-order  version  spaces  are  complicated  by  the  lack  of  MGU’s  and  LGAU’s. 


Generalizing  logic  programs.  Anti-unification  is  strictly  a  syntactic  technique,  but  there 
are,  of  course,  ‘deeper’  methods  of  abstracting  expressions.  In  the  case  of  logic  programming, 
one  goal  might  be  considered  more  general  than  another  if  it  is  true  for  a  greater  range  of 
instantiations  to  its  variables.  For  instance,  we  could  drop  conditions  from  a  conjunction: 
(long  x ,  wide  x)  becomes  long  x.  Or  we  might  add  conditions  to  a  disjunction:  replacing 
long  x  with  (long  x;  wide  x).15  As  we  have  remarked,  although  this  disjunctive  expression 
is  more  ‘general’  (in  that  it  is  less  specific),  it  is  not  any  more  ‘abstract’  (in  that  it  is  not 
any  less  detailed).  In  fact,  this  is  in  some  sense  a  trivial  generalization,  since  arbitrarily 
complex  concepts  may  be  described  as  disjunctive  sequences  (perhaps  infinite)  of  specific 
expressions.  This  same  problem  surfaces  in  higher-order  anti-unification:  the  instances 
long  table  and  wide  table  could  be  trivially  anti-unified  to  /  (long  table)  (wide  table), 
or  more  succinctly  to  (/  long  wide)  table.  This  result  is  considered  ‘disjunctive’  because, 
in  the  simplest  case,  /  is  instantiated  to  project  either  its  first  or  its  second  argument. 

Each  of  the  above  techniques  of  generalizing  logic  programs  —  abstracting  expressions  with 
variables,  dropping  conjuncts,  and  adding  disjuncts  —  can  be  exploited  within  inductive 
paradigms,  thereby  leading  to  approaches  such  as  Vere’s  treatment  of  first-order  predicate 
calculus  [130]  and  Buntine’s  generalized  subsumption  over  first-order  Horn  clauses  [12],  The 
extension  of  these  paradigms  to  encompass  AProlog  is  yet  another  interesting  topic  for  study. 


The  application  of  inductive  techniques.  While  not  considered  herein,  similarity- 
based  techniques  of  generalization  are,  nevertheless,  relevant  to  the  overall  vision  of  tools 
for  design-based  problem  solvers  articulated  in  Chapter  2.  For  one,  this  is  because  inductive 
reasoning  is  required  for  analogical  problem  solving;  that  is,  similarity-based  generalization 
is  a  means  by  which  an  analogical  correspondence  can  be  established  between  a  solved  and 
unsolved  problem.  The  combination  of  this  correspondence  and  the  known  solution  serve  as 
a  guide  in  the  construction  of  a  new  derivation.  Analogical  problem  solving  in  this  manner 
remains  safe  so  long  as  the  resulting  derivation  (for  our  domain,  a  logic  programming  proof) 
can  be  ‘replayed’  to  establish  validity. 


15For  some  further  possibilities  of  generalization  see  Dietterich,  et  al.  [30,  pp. 365-368]. 
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Chapter  5 

The  Extension  of  Logic  Programs 


In  the  previous  chapter,  we  defined  and  illustrated  the  process  of  EBG.  The  resulting 
explanation-based  generalizations,  however,  are  of  little  use  unless  we  also  provide  a  means 
for  assimilation  —  that  is,  for  augmenting  the  existing  logic  program  V  with  new  rules.  We 
define  learning ,  then,  to  be  the  combination  of  generalization  and  assimilation.  But  rather 
than  reserving  the  term  ‘assimilation’  for  the  addition  of  clauses  derived  through  EBG,  we 
instead  use  it  to  refer  to  the  extension  of  V  with  arbitrary  D-forms. 

The  design  of  an  appropriate  assimilation  mechanism  for  AProlog  is  complicated  by  our 
desire  that  it  have  the  following  characteristics: 

•  Semantic  simplicity.  The  designers  of  AProlog  took  care  to  cultivate  semantic  elegance 
within  the  language.  Therefore,  we  require  that  the  language  primitives  we  introduce 
for  controlling  generalization  and  assimilation  continue  in  the  AProlog  ‘spirit’.  Pri¬ 
marily,  this  means  that  such  mechanisms  admit  a  declarative  semantics,  which  should 
permit  guarantees  such  as  “this  additional  assumption  already  follows  from  the  pro¬ 
gram.”  By  ‘declarative’  we  mean  that  the  effects  of  a  construct  be  readily  scrutinized; 
i.e.,  that  the  construct  have  a  straight-forward  definition  which  is  easy  to  reason  about. 
Moreover,  this  declarative  characterization  should  exist  apart  from  any  particular  op¬ 
erational  interpretation  (language  implementation).  (The  distinction  between  declara¬ 
tive  and  operational  descriptions  will  be  further  developed  as  we  progress  through  this 
chapter.) 

•  Programmability.  Any  proposed  assimilation  mechanism  should  truly  be  ‘programmable’, 
since  we  do  not  believe  that  automatic  (i.e.,  uncontrolled)  assimilation  of  derived  facts 
or  generalizations  is  either  practical  or  desirable.  Under  a  native  approach  to  learning, 
the  underlying  problem  solving  architecture  produces  and  assimilates  generalizations 
in  the  course  of  solving  each  query  (at  least  when  learning  is  ‘switched  on’).  Except  for 
the  fact  that  explanation-based  generalizations  represent  abstractions  of  computation, 
this  approach  to  learning  is  analogous  to  following  the  solution  of  every  Prolog  query 
G  with  assert  G.  The  resulting  proliferation  of  clauses  leads  to  increased  matching 
overhead,  and  then  perhaps  to  impaired  rather  than  improved  performance.  Conse¬ 
quently,  it  generally  becomes  necessary  that  assimilation  under  such  a  paradigm  either 


be  selective  (e.g.,  via  some  performance  analysis)  or  involve  the  forgetting  of  those 
derived  rules  only  infrequently  used.1 

Confining  generalization  and  assimilation  to  the  underlying  architecture  is  problem¬ 
atic: 

•  To  avoid  the  additional  cost  of  producing  generalizations  and  that  of  match¬ 
ing  against  a  proliferation  of  rules  in  the  program-base,  generalization  should 
be  only  selectively  enabled  (in  our  view,  by  the  programmer,  through  the 
programming  language). 

•  The  results  of  EBG  should  be  subject  to  modification  by  the  program  be¬ 
fore  assimilation:  for  reasons  we  shall  illustrate,  it  is  often  desirable  for  the 
assumed  clause  to  differ  from  the  derived  rule.  Typically,  this  variation  may 
be  expressed  as  a  simple  forward  reasoning  step. 

Thus,  we  advocate  an  architecture  in  which  generalization  and  assimilation  are  realized 
through  language  features  rather  than  as  aspects  of  the  underlying  system  (and  therefore 
inaccessible  to  programmer  control).  The  resulting  language,  Aj? Prolog,  is  intended  to 
serve  as  an  effective  platform  for  programming  higher-order  applications  relying  upon 
explanation-based  learning. 

In  developing  a  more  logical  approach  to  explanation-based  learning,  we  recognized  the  need 
for  more  limited  forms  of  generalization  within  AProlog.  The  limited  generalization  to  which 
we  are  referring  is  that  of  universally  quantifying,  or  universally  generalizing ,  existing  free 
variables.  Within  Prolog  this  is  accomplished  by  assert,  in  that  assert  (p  x )  implicitly 
universally  quantifies  x,  effectively  adding  the  clause  Vx.  p  x  to  the  program. 

The  logic  programming  predicates  assert,  retract,  call,  univ,  and  var  may  be  characterized 
as  ‘meta-logical’,  because  they  are  concerned  more  with  the  manipulation  of  logic  programs, 
including  the  currently  running  program  itself,  than  with  the  logic  of  the  language.2  That 
is,  they  function  at  a  fundamentally  different  level  —  the  meta-level,  and  therefore,  are 
typically  defined  only  operationally  (e.g.,  within  an  interpreter),  and  apart  from  the  logical 
foundation  of  language.  This  leads  to  problems  in  analysis  and  compilation  (see  [79],  for 
example). 

In  this  chapter  we  provide  a  logical  foundation  to  a  class  of  applications  previously  requiring 
assert.  To  that  end,  we  propose  the  rule  construct,  which  introduces  a  limited  element  of 
forward  reasoning  into  AProlog.  rule  allows  us  to  program  in  a  natural  and  declarative  way 
many  meta-programming  applications  —  e.g.,  memoization,  partial  evaluation  combined 
with  reflection,  and  resolution  —  that  heretofore  relied  upon  extra-logical  features. 

Later,  we  develop  an  analogous  construct,  rule.ebg,  which  also  extends  a  program  by  one  of 
its  consequences.  The  difference  is  in  the  consequence  to  be  assumed:  rule’s  assumption  is 

*For  a  discussion  of  these  issues  see  Prieditis  k  Mostow  [108,  pp. 496-497],  Minton  [91],  and  Donat  k 
Wallen  [37]. 

2call  G  causes  G  to  be  solved;  the  AProlog  equivalent  is  simply  G.  univ  provides  for  the  destructuring 
of  Prolog  terms  into  a  functor  (predicate)  and  arguments;  within  AProlog,  higher-order  unification  addresses 
this  task,  var  M  succeeds  if  M  is  a  logical  variable  and  fails  otherwise;  there  is  no  AProlog  analog. 
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derived  through  universal  generalization,  while  rule.ebg’s  is  the  result  of  explanation-based 
generalization. 

Although  both  of  these  constructs  are  proposed  and  applied  in  the  framework  of  AProlog, 
the  underlying  ideas  are  general,  and  thus  relevant  to  other  logic  programming  languages. 

Other  work.  We  briefly  mention  three  efforts  concerned  with  establishing  a  formal  ac¬ 
count  of  meta-level  logic  programming  constructs:  As  discussed  in  Chapter  3,  AProlog  [100] 
uses  a  fragment  of  higher-order  intuitionistic  logic,  and  gives  a  logical  foundation  for  call 
(through  higher-order  predicates)  and  some  uses  of  assert  (through  embedded  implication). 
HiLog  [17]  uses  an  even  narrower  fragment  of  higher-order  logic,  and  can  give  a  declarative 
account  for  many  uses  of  univ  and  call.  The  language  Godel  [14]  follows  a  different  approach 
by  explicitly  separating  the  meta-level  from  the  object-level. 


5.1  Existing  Approaches  to  Extending  Logic  Programs 

5.1.1  Prolog’s  “assert” 

Prolog  permits  the  modification  of  the  current  logic  program  through  the  primitives  assert 
and  retract:  assert  D  adds  clause  D  to  the  program,  while  retract  D  removes  D.3  The 
following  list  characterizes  the  more  frequent  applications  of  assert  and  retract  in  Prolog: 

•  Memoization  —  To  avoid  the  re-computation  of  previously  solved  goals,  derived  results 
are  memoized,  or  cached.4  Herein,  the  programmer  must  insure  that  assert  is  only 
applied  to  goals  deductively  following  from  the  original  program  V.  We  call  this  a 
conservative  extension  of  V.  For  example,  the  following  definition  of  the  Fibonacci 
function  will  not  recompute  values  (unless  it  backtracks  after  an  initial  solution): 

fib  0  1. 

fib  1  1. 

fib  m  n  <=  m  >  1,  mi  is  m  —  1,  m2  is  m  —  2, 
fib  m2  n2,  asserta  (fib  m2  TI2), 
fib  mi  »»i,  asserta  (fib  mi  ni),  n  is  ni  +  n2- 

Memoization  is  a  rudimentary  form  of  forward  reasoning  —  which  we  define  to  be  any 
paradigm  in  which  a  knowledge-base  (in  this  case,  a  logic  program)  grows  by  computing 
and  assimilating  facts  that  follow  deductively.  Although  individual  goals  are  derived 
through  the  standard  backchaining  of  Prolog,  their  assimilation  represents  a  forward 
reasoning  step. 


3Prolog  implementations  typically  offer  both  asserta  and  assertz:  the  former  effectively  adds  the  clause 
to  the  beginning  of  the  program,  while  the  latter  does  so  at  the  end.  For  purposes  of  general  discussion,  our 
use  of  assert  encompasses  both  constructs. 

4There  is  an  inherent  tradeoff  in  the  application  of  memoization:  the  overhead  of  matching  against  a 
proliferating  set  of  program  clauses  can  result  in  deteriorating  rather  than  an  improved  performance. 


56 


•  Interaction  memoization  —  This  is  an  alternative  application  of  memoization  in  which 
a  program  queries  the  user  for  assistance,  and  then  records  the  result  of  the  interaction 
with  assert.  Such  extensions  to  the  program  axe  generally  not  conservative.  See 
Rowe  [113,  pp.  126-127]  for  an  example. 

•  Program  reflection  —  Reflection  is  the  mapping  of  the  data  structure  representing  a 
program  into  an  executable  version  of  that  program.  (Its  inverse  —  the  mapping  of 
an  executable  into  data  structure  for  scrutiny  or  manipulation  —  is  reification.)  The 
need  for  reflection  arises  when  one  wants  to  run  a  program  constructed  by  another 
program.  Reflection  allows  the  derived  program  to  be  executed  directly,  in  that  way 
avoiding  the  inefficiency  and  complexity  of  interpreting  a  program  datatype. 

The  results  of  partial  evaluation  (PE)  represent  one  important  application  of  reflec¬ 
tion.  In  the  context  of  logic  programming,  partial  evaluation  consists  of  deriving  a 
sufficient  condition  G  for  a  particular  query  Es;  that  is,  PE  produces  a  specialization 
of  the  logic  program  V  that  captures  the  computation  leading  from  E  to  G.  Through 
use  of  the  resulting  derived  rule  J5  •<=  G,  we  avoid  re-doing  the  intervening  computa¬ 
tion.  In  §4.6,  we  introduced  an  interactive  partial  evaluator  peval  for  AProlog.  Rules 
derived  through  peval  could  be  assimilated  with  assert,  as  in  the  top-level  predicate 
peval-top: 

peval-top  E  <=  peval  E  G,  asserta  (E  <=  G). 

In  §5.3.1  we  show  how  reflection  can  be  achieved  declaratively  with  our  proposed  rule 
construct. 

•  Retaining  information  across  a  failure  —  The  assumptions  made  by  assert  extend 
beyond  a  failure;  that  is,  backtracking  does  not  retract  asserted  clauses.  This  leads  to 
the  using  of  assert  as  a  means  to  communicate  ‘across’  a  failure,  which  we  illustrate 
through  the  coding  of  a  bagof  predicate,  bagof  produces  a  list  L  of  every  instance 
that  satisfies  a  given  single-argument  predicate  P.  The  following  implementation  of 
bagof  exploits  logic  progiamming’s  backtracking  starch  to  iterate  over  potential  values 
for  P's  argument  x.  assert  and  retract  axe  used  to  maintain  the  intermediate  values 
of  this  iteration.6 

bagof  P  L  ■$=  asserta  (temp  nil),  fail, 

bagof  P  L  4=  Px,  temp  K,  retract  (temp  K), 

asserta  (temp  (x  ::  K)),  fail, 
bagof  P  L  4=  temp  Z,  retract  (temp  Z). 

The  first  clause  initializes  temp  —  an  accumulator  for  L.  The  second  ‘iterates’  (via 
backtracking  search)  over  values  of  x  that  satisfy  P,  storing  them  within  K.  When  no 
more  such  x’s  can  be  found,  the  ‘bag’  is  returned  by  the  third  clause. 

5As  introduced  within  Chapter  4,  the  class  E  stands  for  AProlog  logical  expressions  that  are  both  clauses 
and  goals  —  i.e.,  E  =  DOG. 

6This  encoding  of  bagof  is  a  higher-order  version  of  Rowe’s  [113,  pp.236-238]. 
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•  Mutable  data  —  assert  and  retract  additionally  support  the  side-effecting  of  global 
data  structure.  For  instance,  a  breadth-first  graph  search  may  be  implemented  by 
updating  a  fringe  clause  that  contains  the  set  of  vertices  currently  on  the  fringe  of 
the  search  [113,  p.234].  Similarly,  assert  could  be  employed  to  side-effect  the  graph 
itself  for  computing,  say,  its  transitive  closure. 

•  Search  control  —  assert  and  retract  can  be  used  to  supersede  Prolog’s  normal  depth- 
first  search  by  hiding,  reordering,  or  revealing  clauses,  or  instead  by  setting  global  ‘vari¬ 
ables’  to  perform  the  same  function.  For  example,  Rowe  describes  a  ‘focus-of-attention’ 
forward  reasoner  which  recalls  facts  (via  a  fact  predicate)  for  forward-chaining  and 
then  shifts  them  to  ‘used’  status  (by  retracting  fact  and  asserting  usedfact)  [113, 
pp.  137-140]. 

Of  course,  assert  (in  combination  with  retract)  also  supports  more  general  instances  of 
self-modifying  code,  and  is  largely  accepted  as  an  important  and  necessary  feature  of  logic 
programming  languages.  The  principle  drawback  of  assert,  however,  is  that  it  has  no  acces¬ 
sible  declarative  meaning.  Consequently,  work  on  the  semantics  of  logic  programs  typically 
ignores  it:  consider  that  there  is  no  straightforward  means  for  incorporating  assert/retract 
within  inference  systems  such  as  that  defined  in  §3.7.  And  as  a  result,  Prolog  implementa¬ 
tions  behave  inconsistently:  Lindholm  &  O’Keefe  [79,  p.22]  offer  the  example 

p  <=  assertz  p,  fail, 
p  <=  fail. 

Whether  ?  —  p  will  succeed  or  fail  depends  upon  the  semantics  of  assert:  Given  a  goal  to 
solve  p,  should  the  set  of  relevant  clauses  be  determined  once  for  p’s  solution,  or  should  it 
instead  be  dynamically  adjusted  (following  changes  in  logic  program  itself)  in  the  course  of 
solving  p.  Under  the  former  interpretation,  ?  —  p  fails,  while  under  the  latter  it  succeeds. 

Given  the  more  dynamic  approach,  there  are  still  potential  inconsistencies  in  the  availability 
of  additions  to  the  logic  program.  The  alternative  example 

q  4=  fail, 
q  <=  assertz  q,  fail. 

behaves  differently  in  some  Prolog  implementations:  the  Warren  Abstract  Machine  [132]  — 
an  abstract  interpreter  forming  the  basis  of  several  Prolog  implementations  —  succeeds  on 
?  —  p  and  fails  on  ?  —  q. 

For  this  and  other  reasons,  AProlog  does  not  include  assert,  although  some  of  assert’s 
functionality  is  subsumed  by  another  construct  —  embedded  implication.  However,  as  we 
shall  illustrate,  embedded  implication  is  neither  powerful  enough  to  support  the  above  ap¬ 
plications  of  assert,  nor  for  that  matter,  to  support  the  assimilation  of  explanation- based 
generalizations.  This  lead  us  to  explore  the  possibility  of  making  logically  motivated  ex¬ 
tensions  to  AProlog  that  address  these  deficiencies.  In  particular,  we  focus  upon  those  uses 
of  assert  above  that  involve  memoization  and  reflection.  The  other  illustrations  of  assert 
are  frequently  stylistically  questionable,  and  can  often  be  reformulated  without  assert  in  a 
manner  no  more  complex  and  no  less  efficient.  In  any  case,  rule  is  not  intended  to  subsume 
the  functionality  of  assert,  but  rather  to  provide  a  more  declarative  alternative  for  some 
subset  of  its  uses. 
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5.1.2  Embedded  Implication 

It  has  been  argued  in  the  literature  [83,  47,  9]  that  implication  (with  its  intuitionistic  mean¬ 
ing)  can,  in  many  situations,  be  used  in  place  of  assert,  and  can  also  be  given  a  simple 
declarative  semantics.  The  operational  reading  of  embedded  implication  is  that  when  solv¬ 
ing  the  goal  D  =*■  G,  assume  D  while  solving  G.  Thus  without  any  program,  the  query 

?  -  p  1  =>  p  x. 

succeeds  with  the  answer  2  =  1.  The  assumption  of  an  implication  is  in  effect  exactly  while 
solving  the  consequent,  and  hence 

?  -  (p  1  =►  p  x),  p  y. 

will  fail,  though 

?  -  ((pl»  p2)=>px),  x  =  2. 

succeeds  after  some  backtracking. 

Implication  is  of  particular  importance  when  we  wish  to  make  an  assumption  for  a  particular 
computation  and  then  ‘forget’  it.  Consider  a  reformulation  of  peval-top: 

peval-top  E  K  <=  peval  EG,  (E  -4=  G)  =>■  K. 

The  revised  peval.top  takes  two  arguments:  the  goal  E  to  be  partially  evaluated,  and  a 
second  goal  K  representing  the  context,  or  scope,  for  which  the  assumption  E  <=  G  will  be 
valid.  ( lK ’  is  for  ‘continuation’,  which  is  developed  below.)  The  rationale  behind  peval-top 
is  that  the  client  has  some  computation  (captured  in  K)  for  which  a  particular  specialization 
of  the  program  E  •<=  G  is  applicable,  yet  he  does  not  desire  to  make  that  optimization 
permanent  (since,  perhaps,  it  impairs  performance  in  the  general  case). 

At  first,  it  might  appear  that  the  following  definition  would  behave  identically: 

peval-top  E  K  *=  peval  E  G,  asserta  ( E  •«=  G),  K,  retract  ( E  <=  G). 

However,  the  above  is  not  equivalent  to  the  preceding  version.  Suppose  that  the  compu¬ 
tation  associated  with  K  also  makes  extensions  to  the  logic  program.  Should  one  of  these 
assumptions  unify  with  E  G,  V  could  be  left  in  an  inconsistent  state.  Such  potentially 
conflicting  side-effects  illustrate  the  difficulty  in  reasoning  about  programs  that  use  assert; 
that  is,  they  illustrate  the  non-declarative  nature  of  those  programs. 

In  fact,  assert  and  retract  are  sufficient  to  encode  implication  in  general,  subject  to  limi¬ 
tations  discussed  below.  The  following  uses  assert  and  retract  to  establish  the  appropriate 
assumption,  even  in  the  face  of  backtracking:7 

(D  =>•  G)  ■$=  (asserta  D;  (retract  D,  fail)), 

G, 

(retract  D;  (asserta  D,  fail)) 

7This  formulation  is  due  to  Stuart  Shieber. 
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where  backtracking  over  the  first  conjunct  retracts  D,  and  backtracking  over  the  third  re¬ 
asserts  D  (for  G's  solution).  Of  course,  the  above  implementation  does  not  address  conflicting 
side-effects,  universal  generalization  (§5.1.3),  or  the  precise  scoping  of  the  asserted  clause 
(§5.1.4). 

5.1.3  Universal  Quantification  in  Assumptions 

In  a  Horn  logic,  all  assumptions  are  closed:  whatever  apparently  free  variables  occur  in  a 
clause  D  are  in  fact  universally  quantified.  Because  of  this,  a  Horn  logic  program  cannot 
change  during  its  execution  (at  least,  not  without  the  application  of  meta-logical  predicates 
such  as  assert).  As  pointed  out  in  Chapter  3,  this  is  not  the  case  for  logics  that  include 
embedded  implication:  assumptions  therein  added  to  V  may  contain  logic  variables  that  are 
not  copied  when  that  clause  is  used.  Instead,  the  program  may  actually  change  (through 
variable  instantiation)  in  the  course  of  solving  G.  Thus,  we  distinguish  between  the  assump¬ 
tions  p  x  and  Vx  p  x.  This  is  no  great  inconvenience:  a  clause  occurring  at  the  top-level  in 
a  program  (typically  those  read-in  from  a  file)  is  still  considered  to  be  universally  quantified 
over  its  free  variables,  but  no  such  convention  exists  for  embedded  implications. 

This  points  out  a  manner  in  which  implication  is  less  powerful  than  assert:  the  former’s 
assumption  is  not  universally  generalized.  For  instance, 

?  —  asserta  (p  x),  pi,  p  2. 

succeeds  in  Prolog,  while 

?  -  p  x  =»  (p  1,  p  2). 

fails  in  AProlog:  as  one  can  see,  there  is  no  x  such  that  p  x  implies  both  p  1  and  p  2. 
Operationally,  what  happens  is  that  resolving  p  1  with  the  assumption  p  x  instantiates  x  to 
1,  and  the  now  instantiated  assumption  p  1  does  not  unify  with  the  second  subgoal  p  2.  On 
the  other  hand,  the  following  clearly  succeeds: 

?-  (Vx.px)=»  (pi,  p  2). 

It  should  be  remarked  here  that  this  behavior  of  embedded  implication  is  not  a  design 
mistake,  but  has  its  applications,  and,  furthermore,  is  entailed  by  the  desire  to  make  only 
logically  sound  extensions  to  basic  Horn  logic  (for  a  further  discussion  see  [83]). 

This  limitation  of  implication  does  illustrate  a  problem  within  the  preceding  formulation  of 
pevaLtop:  a  clause  E  4=  G  derived  by  partial  evaluation  and  then  assumed  (via  implication) 
can  only  be  used  with  one  substitution  for  its  logical  variables.  We  conclude  that  neither 
implication  nor  assert  is  the  proper  mechanism  for  the  situation  as  described. 
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5.1.4  Goal  Continuations 


Implication  is  also  restricted  in  that  its  precise  scope  can  limit  the  exploitation  of  its  as¬ 
sumptions.  Consider  another  definition  of  Fibonacci  which  attempts  to  exploit  implication 
for  memoization: 

fib  0  1. 

fib  1  1. 

fib  m  n  <=  m  >  1,  mi  is  m  —  1,  m2  is  m  -  2, 

fib  m2  n2,  fib  m2  n2  =>  fib  mi  ni,  n  is  ni  +  n2. 

The  problem  here  is  that  fib’s  assumptions  are  not  uniformly  visible:  consider  a  recursive 
trace  of  fib  as  a  binary  tree,  where  the  computation  of  the  nth  Fibonacci  Fn  is  reduced  to 
that  of  Fn- 2  and  Fn_i ,  with  Fn- 2  known  for  the  latter: 

Fn 

Fn— 2  Fn— 2  ^  Fn_i 

Fn- 4  Fn_4  =>■  Fn- 3  Fn- 3  Fn_3  =►  Fn-2 

So  in  the  recursive  computation  of  F„_ i,  the  only  pending  assumption  is  that  of  F„_2,  and 
hence  F„_3  must  be  re-derived.  Thus,  while  the  above  performs  considerably  better  than  a 
similar  program  without  implication,  the  original  version  using  assert  is  substantially  more 
efficient  (linear). 

This  problem  can  effectively  be  circumvented  by  reformulating  the  program  in  continuation¬ 
passing  style  (CPS)  [110],  which  the  reader  may  have  encountered  in  the  context  of  functional 
programming.  To  realize  CPS  under  AProlog,  we  add  another  argument  K  (a  goal)  to  our 
predicate.  This  goal  is  intended  to  represent  the  remainder  of  the  computation,  and  thus 
rather  than  returning  control  upon  success,  clauses  invoke  this  ‘goal  continuation.’  In  this 
way,  accumulated  assumptions  are  made  available  to  extended  computations.  The  following 
formulation  of  fib  makes  use  of  CPS: 

fib  m  n  <=  fibi  m  n  true, 

fibx  0  1  K  <=  K. 

fib!  1  1  K  K. 

fibi  m  n  K  <=  m  >  1,  mi  is  m  -  1,  m2  is  m  —  2, 

fibi  m2  n2  ((VA".  flbj  m2  n2  K'  ^  K')  =>■ 

fibi  to i  ni  ((VA''.  fibi  mi  ni  K '  <£=  K')  =>  (n  is  n\  +  n2.  A"))). 

While  this  illustration  may  be  somewhat  inscrutable  to  those  not  acclimated  to  CPS  (higher- 
level  notations  would  be  helpful),  the  underlying  intuition  is  not  that  difficult:  K  captures 
the  computation  necessary  to  solve  a  pending  fibi  calculation.  In  that  regard,  it  acts  as 
an  accumulator  for  the  pending  subgoals  of  that  computation.  Within  the  last  clause, 
fibi  m2  n2  is  computed,  and  then  the  program  is  extended  with  the  assumption  that  for  any 
K,  fibi  m2  n2  K  reduces  to  K  (since  m2  and  n2  have  been  instantiated  to  particular  values). 
In  this  way,  fibi  m2  n2  need  not  be  re-calculated  by  any  computation  nested  within  K. 

Our  motivation  for  this  digression  into  CPS  is  that  it  is  a  powerful  mechanism  through  which 
implication  (and  later  our  extension,  rule)  can  be  more  fully  exploited  within  AProlog. 
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5.1.5  Persistence  of  Assumptions 


A  further  limitation  of  implication’s  well-defined  scope  is  that  there  does  not  appear  to  be  a 
means  for  making  assumptions  persistent.  eLP  (our  AProlog  implementation)  circumvents 
this  with  no  loss  of  elegance  by  allowing  the  programmer  to  initiate  a  new  top-level  interpre¬ 
tation  via  the  special  goal  top  (introduced  by  Pfenning),  which  recursively  invokes  a  new 
AProlog  listener,  thereby  effectively  ‘globally’  extending  the  existing  logic  program  with  any 
pending  assumptions,  such  as  within  fibi  m  n  top. 


5.1.6  Summary 

We  have  seen  that  assert  and  retract  are  insufficient  to  program  implication,  due  to  the 
lack  of  proper  scoping  and  the  possibility  of  conflicting  side-effects.  Conversely,  we  find  that 
there  are  three  aspects  of  assert  which  are  difficult  to  model  with  embedded  implication: 

1.  global  accessibility  of  the  asserted  clause,  although  this  can  often  be  achieved  using 
continuation  passing  style; 

2.  persistence  of  the  asserted  clause,  which  has  been  addressed  in  AProlog  with  the  special 
predicate  top;  and 

3.  universal  generalization  of  assumed  clauses. 

The  last  of  the  three,  universal  generalization,  is  the  must  problematic,  because  there  is 
often  no  way  to  program  it  short  of  completely  reformulating  the  data  representation.8  It 
is  also  universal  generalization  which  is  addressed  by  our  proposed  rule  construct.  Since 
rule  resembles  implication  in  that  its  assumptions  are  always  given  a  limited  scope,  the 
techniques  employed  in  (1)  and  (2)  will  continue  to  be  relevant  for  programming  with  the 
new  construct. 


5.2  Lemma 

We  seek  to  address,  in  a  declarative  manner,  embedded  implication’s  inadequacy  with  regard 
to  universal  generalization.  To  that  end,  in  the  section  to  follow  we  propose  the  rule 
construct.  Before  we  introduce  rule,  however,  we  first  attempt  to  motivate  that  extension 
with  a  less  general  counterpart,  the  lemma  construct,  lemma,  which  we  later  establish  to 
be  a  special  case  of  rule,  brings  to  light  many  issues  relevant  to  rule’s  development. 

8This  is  essentially  the  solution  advocated  by  Burt  et  al.  [14]. 
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5.2.1  Prolog’s  “lemma” 


The  semantic  inelegance  of  assert  has  lead  to  the  consideration  of  other  means  by  which 
logic  programs  may  be  extended.  One  such  alternative  is  lemma  as  described  by  Sterling 
&  Shapiro  [124,  p.181],  which  may  be  defined  within  Prolog  as 

lemma  E  <=  E,  asserta  E. 

(Actually,  Sterling  &  Shapiro’s  formulation  is 

lemma  E  <*=  E,  asserta  (E  <=■  !). 

which  does  not  backtrack  to  other  clauses  if  a  lemma  applies.) 

lemma  is  more  ‘logical’  than  assert  in  that  it  only  permits  conservative  extension  —  that 
is,  added  clauses  necessarily  follow  from  the  theory  described  by  the  logic  program.  We  may 
again  reformulate  Fibonacci  as 

fib  0  1. 

fib  1  1. 

fib  m  n  <=  m  >  1,  mi  is  m  —  1,  m2  is  m  —  2, 

lemma  (fib  m2  n2),  lemma  (fib  mi  «i),  n  is  «i  +  n2. 

5.2.2  A  Scoped  “lemma”  Construct 

Prolog’s  lemma  takes  a  single  argument  —  the  goal  E  to  be  solved  and  then  assumed. 
We  will  now  define  an  analogous  lemma  within  AProlog.  The  new  lemma  more  resembles 
implication  in  that  it  gives  its  assumption  a  proper  scope.  AProlog’s  lemma,  then,  requires 
two  arguments:  (1)  the  goal  E  to  be  solved  and  assumed,  and  (2)  a  goal  K  representing  the 
scope  for  which  the  assumption  is  valid,  lemma  E  K  may  be  informally  characterized  as 

lemma  E  K  •«=  E,  ( E '  =>  K). 

where  E'  is  a  universal  generalization  of  E. 

To  illustrate  an  application  of  this  scoped  lemma,  consider  one  last  reformulation  of  fib: 

fib  0  1. 

fib  1  1. 

fib  m  n  <=  m  >  1,  mi  is  m  -  1,  m2  is  m  -  2, 
lemma  (fib  m2  1*2) 

(lemma  (fib  mi  ni) 

(n  is  ni  +  n2)). 
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(This  version  suffers  from  precisely  the  same  inefficiency  as  that  relying  upon  embedded 
implication.  In  fact,  the  two  are  equivalent.  See  §5.1.4.)  As  with  embedded  implication, 
persistence  can  be  achieved  with  lemma  E  top. 

Yet  lemma  is  in  some  ways  more  powerful  than  embedded  implication.  For  example,  new 
clauses  may  be  derived  through  lemma:  from  the  program 

child -of  x  y  <=  parent  y  x. 
parent  x  y  child_of  y  x. 

ancestor  x  y  <=  parent  x  y;  (parent  x  z,  ancestor  z  y). 
we  may  derive  the  general  goal 

?  —  child-of  x  y  =>  ancestor  y  x. 

for  arbitrary  (uninstantiated)  x  and  y.  lemma  affords  the  universal  generalization  of  such 
variables:  the  goal 

?  —  lemma  (child_of  x  y  =»  ancestor  yx)  K. 
will  assume  the  derived  clause 

Vx  Wy.  ancestor  y  x  <=  child-of  x  y. 

before  attempting  the  solution  of  K.  In  this  way,  lemma  supports  the  extension  of  the 
program  with  new  universally  quantified  clauses  that  follow  from  that  program.  Implication 
alone  cannot  universally  generalize  x  and  y. 

The  operational  reading  of  V  h  lemma  E  K  is 

Solve  V  !r  E.  If  this  fails,  backtrack.  Otherwise,  it  succeeds  with  substitution  B. 

Let  y  be  the  set  of  the  free  variables  remaining  in  BE  that  do  not  appear  free 
in  0V,  and  let  stand  for  the  universal  quantification  of  each  y  in  y.  Thus, 
vy.  BE  is  the  universal  generalization  of  BE  over  variables  that  do  not  occur  in 
BV. 

Next  solve 

{iy.BE)  U  BP  h  OK 

If  this  succeeds  with  substitution  0,  then  V  lemma  E  K. 
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Declaratively,  lemma  E  K  is  simply  equivalent  to  ( E ,  K),  the  conjunction  of  E  and  K.  By 
the  declarative  equivalence  of  two  goals  G\  and  G?,  we  mean  that  the  given  goals  follow  from 
the  same  logic  programs  and  are  satisfied  by  the  same  substitutions  —  i.e.,  BV  h  9G\  if  and 
only  if  6V  I-  OG2. 

An  operational  reading ,  on  the  other  hand,  considers  the  search  behavior  ( e.g .,  ordering  of 
selections  and  backtracking)  of  a  specific  logic  programming  interpretation.  As  an  example 
of  the  difference,  consider  that  tae  Prolog  goal  once 

once  G  <=  G, !. 

is  declaratively  the  same  as  G:  each  is  satisfied  by  similar  substitutions  and  logic  programs. 
Operationally,  however,  the  interpreter  does  not  backtrack  to  find  alternative  solutions  of 
once  G. 

The  purpose  of  lemma  is  to  affect  termination  and  efficiency  without  affecting  provability: 
lemma  controls  search  by  selectively  expanding  the  program.  This  expansion  is  through  the 
assumption  Vy.  BE.  The  savings  afforded  by  lemma  is  simply  that  rather  than  successively 
re-deriving  and  re-instantiating  E ,  it  is  derived  once  and  then  universally  generalized  to 
vy.  BE,  so  that  the  latter  may  be  exploited  in  the  solution  of  multiple  goals.  As  BE  is  neces¬ 
sarily  a  consequence  of  V,  Vy.  BE  follows  from  universal  generalization  via  the  discharging  of 
logic  variables  not  free  in  V.  Without  the  forward  reasoning  step  resulting  in  the  assumption 
vy.  BE,  the  solution  of  K  could  not  even  terminate.  Even  \fV%~K  succeeds,  the  discovered 
proof  is  potentially  much  longer  than  that  associated  with  {V.T.  BE}  U  0V  lb  6K  (as  is  the 
case  with  fib). 

Of  course,  lemma  cannot  be  effectively  implemented  as 
lemma  M  K  <=  (M ,  M  =>  K). 

While  this  is  a  correct  implementation  in  that  it  is  equivalent  to  the  declarative  reading 
( M ,  K),  it  does  not  realize  the  operational  definition  —  i.e.,  it  does  not  universally  gener¬ 
alize,  and  thus  will  offer  the  same  performance  as  (M,  K ). 

In  fact,  lemma  cannot  be  programmed  within  the  existing  language,  since  AProlog  affords 
no  means  by  which  to  universally  generalize  free  variables.  The  universal  generalization 
step,  which  is  required  in  the  implementation  of  both  lemma  and  assert,  is  problematic  for 
languages  with  embedded  implication.  Consider 

?-  pi  ^  lemma(pi)(pl,  p2). 

Since  p  x  trivially  follows  from  itself,  the  above  might  naively  be  expected  to  make  the 
assumption  Vx.  p  x.  But  this  does  not  follow  from  the  program,  since  its  declarative  coun¬ 
terpart 

?  -  p  x  =>  (p  x,  (p  1,  p  2)). 
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is  not  true  and,  in  fact,  fails  in  AProlog. 

The  problem  of  defining  assert  within  AProlog  is  even  more  dramatic:  consider  that  there 
is  no  obvious  meaning  for 

?  —  3x  'iy.  q  x  y  =>  asserta  (r  x  y). 

How  can  x  and  y  be  meaningfully  quantified  within  a  globally  asserted  clause?  The  difficulty 
is  that  AProlog  goals  are  solved  within  the  scope  of  a  particular  set  of  local  assumptions  and 
variable  bindings.  They  are  not  free  to  ‘stand  alone’  as  Horn  clauses  are.  Our  version  of 
lemma  is  reconcilable  with  AProlog  because  it  too  is  scoped:  lemma’s  assumption  Vy.  BE 
is  only  valid  within  the  scope  of  the  deriving  context  —  that  is,  is  for  the  solution  of  K. 
An  unscoped  assert,  on  the  other  hand,  does  not  make  sense  for  AProlog,  because  its 
assumptions  are  expected  to  persist  beyond  the  extent  of  the  defining  context. 


5.2.3  Formal  Definition 

The  operational  definition  of  lemma  may  be  formalized  within  the  following  inference  rule 
(a  la  §3.7): 

VbeE  {vy.  BE)  U  6V  \~4,  OK 

V  b^,g  lemma  E  K  where  y  —  free(0£)  —  iree(BV). 

The  preceding  operational  reading  ensures  that  lemma  will  succeed  only  if  its  corresponding 
declarative  interpretation  (E  ,  K)  is  valid.  This  property,  the  soundness  of  lemma,  may  be 
proved  by  induction  on  the  definition  of  the  I-  relation  (§3.7): 

Given  V  b^g  lemma  E  K ,  we  must  show  that  V  b  (E  ,  K ). 

From  the  definition  of  lemma,  V  bg  E,  and  then  from  the  ind.hyp.  6V  h  BE. 

Let  y  =  free(0£)  -  fr ee{BV). 

Since  BV  h  BE  and  y  fl  free(0'P)  =  0, 

it  follows  by  universal  generalization  that  BV  h  (V|T.  BE).  (1) 

Also  from  the  definition  of  lemma,  {V^.  BE}  U  BV  BK, 

and  hence  by  the  ind.hyp.  ^{Vy.  BE}  U  \1>BV  h  xftBK.  (2) 

By  cut- elimination9  over  (1)  and  (2),  %1>BV  h  iJjBK 
Again  from  (1),  4>6V  (-  ifrBE. 

And  thus  %1>BV  I-  ipBE,  xj)BK. 


9The  rule  of  cut- elimination  for  I-  states  that  from  PM  and  V  U  A  B,  we  may  conclude  V\ -  B. 
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5.2.4  Alternative  Realizations  (Optional) 

In  order  to  achieve  a  correct  realization  of  lemma,  it  is  necessary  to  suppress  the  universal 
generalization  of  variables  free  in  V.  Through  embedded  implication,  then,  the  program 
may  contain  (in  a  temporary  assumption)  a  free  occurrence  of  x,  which  invalidates  lemma’s 
universal  generalization  over  x.  This  can  be  remedied  in  two  ways:  we  can  either  (1) 
not  generalize  over  variables  free  in  an  assumption  (as  per  lemma’s  definition  above),  or 
instead  (2)  collect  each  of  the  assumptions  containing  variables  free  in  E  within  an  enabling 
precondition  (subgoal)  of  the  lemma  we  create.  For  example,  consider 

?  -  p  x  =>  lemma  (p  x)  (p  1,  p  2). 

For  case  (1)  the  derived  assumption  is  p  x,  and  for  case  (2)  Vx.  rather  than  the 

incorrect  Vx.  p  x.  As  a  slightly  more  complex  illustration,  given  only  the  program 

qxy  <=  p  x,  p  y. 

the  query 

?  -  p  x  =>  lemma  (q  x  y)  K. 

would  make  either  the  assumption  (1)  q  x  x,  or  (2)  Vy.  q  y  y  <=  p  y,  rather  than  the  incorrect 
Vx.  q  x  x. 

For  our  implementation  of  lemma,  we  chose  the  first  solution,  since  it  seems  to  be  more 
frequently  useful.  In  fact,  for  most  situations,  (2)  reduces  to  (1)  since  the  only  means  for  de¬ 
riving  the  additional  subgoal  associated  with  (2)  will  be  precisely  via  the  initial  assumption: 
that  is,  to  derive  the  precondition  p  y  in  the  most  recent  example,  presumably  one  would 
need  the  local  assumption  p  x. 

Effectively  determining  which  variables  appear  free  in  assumptions  is  a  potentially  thorny 
implementation  issue:  given  that  there  are  a  large  number  of  pending  assumptions,  the 
search  required  is  not  insignificant.  Maintaining  a  list  of  such  variables  seems  the  natural 
approach.  Within  §9.2.1  we  discuss  our  implementation  and  its  limitations  in  this  regard. 


5.3  Rule 

5.3.1  Example:  Partial  Evaluation 

Let  us  now  return  to  the  pevaLtop  example  introduced  in  §5.1.1.  Recall  that  the  problem 
with 


peval-top  E  K  <*=  peval  EG,  (E  <=  G)  K. 
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is  that  the  free  variables  of  E  4=  G  are  not  universally  generalized,  thereby  restricting  the 
applicability  of  the  assumption. 

Since  E  •«=  G,  though  true,  is  typically  not  itself  derivable  as  a  goal  (it  is  the  result  of  partial 
evaluation;  not  logic  programming  execution),  the  formulation 

peval-top  E  K  «=  peval  E  G,  lemma  ( E  G )  K. 

is  not  sufficient.  Nor  can  we  augment  lemma  with  a  local  assumption  such  as 

peval-top  E  K  <*=  peval  EG,  (E  <=  G)  =>•  lemma  (E  <=  G)  K. 

because  of  the  aforementioned  restrictions  placed  upon  variables  free  in  assumptions.  As 
one  last  attempt,  consider 

peval-top  E  K  •$=  lemma  (peval  E  G ) 

(V£  VG.  peval  E  G  =>  (E  *=  G))  =>  K ). 

The  reader  might  hope  that  this  formulation  would  perform  the  partial  evaluation,  univer¬ 
sally  generalize  the  result,  and  then  allow  the  result  to  be  exploited  through  one  additional 
embedded  implication  VI?  VG.  peval  E  G  =>  (E  <=  G).  The  problem  is  that  this  impli¬ 
cation  does  not  make  any  sense  under  logic  programming’s  backtracking  search  paradigm: 
as  discussed  below,  this  clause  has  a  variable  head,  and  is  hence  applicable  to  any  goal 
whatsoever. 

Operationally,  what  we  would  like  to  achieve  for  I-  peval-top  E  K  is 

1.  Solve  *-  peval  E  G.  If  this  succeeds  with  a  substitution  6,  let  y  be  the  logic  variables 
contained  in  6E  and  OG  that  do  not  occur  free  in  any  current  assumption. 

2.  Assume  (i.e.,  reflect)  Vy.  BE  <=  6G  while  solving  OK;  that  is,  solve 

{V3>.  BE  <=  0G}  U  BV  I-  BK. 

Why  is  this  a  sound  way  of  establishing  OKI  We  need  to  make  three  crucial  observations: 

1.  Since  we  quantify  only  over  those  variables  which  are  not  free  in  any  current  assump¬ 
tion,  we  know  that  peval  OE  OG  is  a  logical  consequence  of  the  program  for  peval 
(because  of  the  logical  rule  of  universal  generalization). 

2.  The  programmer  knows  that  if  peval  E  G,  then  E  <*=  G  is  a  valid  clause  to  add  to 
the  program  (assuming  peval  has  been  implemented  correctly).  This  is  expressed 
declaratively  within  the  aforementioned  clause 

V£  VG.  peval  EG  =>  (E  <t=  G). 

3.  From  a  simple  forward  reasoning  step,  we  conclude  that  V}>.  0E  <*=  OG  is  true,  and 
hence  can  be  safely  assumed  before  solving  OK. 


Trying  to  abstract  from  this  particular  example,  we  can  see  that  we  need  two  pieces  of 
information  in  order  to  carry  out  the  operations  described  above:  the  original  goal  to  be 
solved  —  peval  E  G,  and  the  general  rule  establishing  the  connection  between  this  goal 
and  the  assumption  we  would  like  to  make  —  Vi?  VG.  peval  E  G  =$■  (E  <=  G).  In  order  to 
properly  scope  assumptions,  we  also  need  to  pass  a  goal  continuation  K  as  an  argument. 
This  line  of  reasoning  is  embodied  within  our  new  construct  rule.  For  peval-top,  the  rule 
invocation  is 

peval-top  E  K  <=  rule  (peval  E  G ) 

(VE  VG.  peval  E  G  =>  (E  <=  G)) 

K. 

The  forward  reasoning  supported  by  rule  takes  the  form  of  a  single  forward-chaining  step 
(specified  by  an  implication)  that  is  under  the  tight  control  of  the  programmer.  Intuitively, 
that  step  is  to  solve  the  left-hand  side  of  the  implication,  and,  upon  success,  assume  the  right. 
Such  a  step  in  the  forward  direction  is  generally  incompatible  with  the  backchaining  of  the 
logic  programming  paradigm:  consider  if  we  were  to  include  VE  VG.  peval  E  G  =>  (E  <=  G) 
within  V,  it  would  be  applicable  to  any  implicational  G-form.  (In  fact,  after  conversion  to 
normal-form,  the  given  clause  is  applicable  to  any  goal  whatsoever.)  Thus,  this  forward  step 
is  of  little  practical  value  outside  of  the  rule  context. 

5.3.2  The  “rule”  Construct 

The  general  form  of  rule  is 

rule  G  (VA\  Gx  =>  Dx)  K 

for  goals  G,  Gx,  K,  and  clause  Dx,  where  X  is  a  (perhaps  empty)  subset  of  the  variables 
free  in  Dx  or  Gx.  To  simplify  the  discussion,  we  assume  that  the  variables  in  X  do  not 
occur  elsewhere. 


Operational  reading.  The  operational  interpretation  of 
V  I-  rule  G  (VA\  Gx  =>  Dx )  K 
is  as  follows: 

1.  Find  a  minimal  substitution  o-x10  such  that  6om(crx)  C  X  and  axGx  =  G.  The  ex¬ 
istence  of  ax  guarantees  that  the  forward-chaining  step  is  applicable.  Should  ax  not 
exist,  fail  and  issue  a  diagnostic  message. 

10Recall  from  §3.7  that  a  given  substitution  a  is  minimal,  or  most  general,  with  respect  to  a  particular 
set  of  conditions,  if  8  satisfies  those  conditions,  and  if  for  any  other  substitution  ip  also  satisfying  those 
conditions,  ip  is  a  instance  of  8. 
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2.  Solve  V  H  G.  If  this  fails,  fail.  Otherwise,  it  succeeds  with  some  substitution  6. 

3.  Let  y  =  free(0G)  -  fret(OV). 

4.  Let  X'  =  X  —  dom(<7*). 

5.  Solve 

{vx'vy.e<jxDx}  u  ev  i-  e k 


(Relying  upon  AProlog  unification  rather  than  explicit  substitution,  the  preceding  opera¬ 
tional  description  may  alternatively  be  codified  as 

1.  Create  new  logical  variables  X  for  each  universal  variable  in  X ,  and  then  substitute 
X  for  X  in  Gx  and  Dx,  yielding  Gx  and  D%- 

2.  Unify  G  and  Gx. 

3.  Solve  G. 

4.  Let  y  =  free(G)  —  free(P) 

5.  Solve  (VXXy.  Dx)  =»  K. 

'iX  is  the  correct  quantification,  since  those  variables  of  X  which  rule  has  instantiated  no 
longer  appear  free  in  Dx.) 

Declarative  reading.  The  proper  declarative  interpretation  for 
rule  G  (yX.  Gx  =>  Dx)  K 
is  simply 

G,  (VX.  Gx  =>  Dx)  =>  K 

which,  like  the  reading  for  lemma,  makes  no  mention  of  universal  generalization  whatsoever. 

The  difference  between  the  operational  and  declarative  readings  illustrate  the  savings  pro¬ 
vided  by  rule:  Under  the  declarative  interpretation,  multiple  instances  of  the  same  general 
goal  G  must  be  solved  in  order  to  establish  instances  of  G’s  consequent  D.  Operationally, 
however,  we  need  solve  G  only  once,  universally  generalize,  and  then  assume  the  universal 
closure  of  its  consequent  D. 

In  fact,  the  reason  fri  explicitly  including  the  VX,  rather  than  just  allowing  the  variables  to 
be  free,  is  that  it  is  required  by  the  declarative  interpretation:  since  it  may  be  necessary  to 
repeat  the  application  of  the  step  VX.Gx  =>■  Dx,  X  must  contain  all  variables  that  may  be 
reinstantiated  during  such  successive  applications. 
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Note  that  VX.Gx  =>  Dx,  though  true,  is  never  used  in  the  backward-chaining  search  of 
the  interpreter;  only  the  result  of  the  forward  step  is  assumed.  This  is  essential  as  the 
clause  one  typically  uses  for  this  forward-chaining  step  is  often  hopelessly  inefficient,  or  else 
quickly  leads  to  non-termination  if  used  in  the  reverse  direction.  We  have  already  given  as 
an  example  the  step  in  the  definition  of  pevaLtop. 

That  it  is  the  free  variables  of  G,  rather  than  those  of  D,  which  are  universally  generalized 
is  essential  for  correctness:  consider 

?  —  rule  true  (true  ^  p  x)  (p  1,  p  2). 

which  fails,  as  established  be  the  invalidity  of  the  declarative  reading: 

?  —  true,  (true  =>  pi)  =>  (p  1,  p  2). 

The  following  variation,  on  the  other  hand,  should  (and  does)  succeed: 

?  —  rule  true  (Vx. true  ^pi)  (p  1,  p  2). 

and,  as  we  would  expect,  its  declarative  reading  behaves  similarly: 

?  -  true,  (Vx.  true  =►  p  x)  =►  (p  1,  p  2). 


5.3.3  Formal  Definition 

The  above  operational  definition  is  formalized  by  the  inference  rule 

Vh„G  axGx  =  G  {VX'Vy.0axDx}  U  OV  OK 
V  \-*b  rule  G  (V<¥.  Gx  =>  Dx)  K 

where  dom(ovr)  C  X , 

< Jx  is  minimal,11 

X'  =  X  —  dom(ax),  and 

y  =  free(0G)  -  free(0T>). 

As  in  the  case  of  lemma,  the  above  operational  definition  ensures  that  rule  will  succeed 
only  if  its  corresponding  declarative  interpretation  is  valid.  This  property,  the  soundness  of 
rule,  is  also  proved  by  induction  over  the  H-  relation  (§3.7): 

Given  V  rule  G  (VA\  Gx  =►  Dx)  K, 

we  must  show  that  V  G  and  {iX.  Gx  =>  Dx)  U?  h  K. 

From  the  definition  of  rule,  V  I -g  G,  and  then  from  the  ind.hyp.,  6V  h  6G. 

Again  from  rule’s  definition,  there  exists  <rx  such  that  crxGx  =  G, 
where  dom(<7^)  C  X  and  ax  is  minimal. 

Thus  OV  h  OaxGx- 
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Let  y  =  free(0G)  —  free(0‘P),  and  let  X'  =  A  —  dom(<r*). 

By  universal  generalization,  BV  h  VA'  \/y.  BaxGx- 
By  weakening,  0{VA.  Gx  =»  Dx)  U  9V  h  VA'  Wy.  6axGX- 
By  universal  instantiation  (via  ax)  and  A- introduction ,  it  follows  that 
9{WX.Gx  =>DX}  U  BV  h  VX'\/y.0axGx,  Bax(Gx  =>  Dx). 

And  then  by  distributivity  of  substitution, 

0{VA.  Gx  Dx }  U  9V  h  VA'  'iy.  BaxGx,  BaxGx  =►  BaxDx 
By  modus  ponens  (rule  application)  and  by  induction  over  the  preceding  steps, 
it  then  follows  that  6{iX.  Gx  =►  Dx)  U  0V  h  VA'  iy.  BaxDx.  (1) 

Also  by  the  definition  of  rule,  {VA'  'iy.  BaxDx}  U  BV  BK, 

and  hence  by  the  ind.hyp.  t/>{VA'  Vy.  6axDx}  U  4>BV  h  ipOK.  (2) 

Now  by  cut- elimination  over  (1)  and  (2),  rpB{WX.  Gx  =>  Dx}  U  tj^BV  h  ipBK. 

Incompleteness  of  “rule.”  The  declarative  reading  of  rule  is  not,  however,  equivalent 
to  its  operational  definition,  as  the  declarative  version  may  succeed  where  the  operational 
fails;  that  is, 

V  h  rule  G  (VA.  Gx  =>  Dx)  K 
does  not  imply 

V  I-  rule  G  (VX.  Gx  =>  Dx)  K 

even  up  to  the  usual  deterministic  limitations  of  logic  programming  (Chapter  3).  This  is 
because  rule’s  assumption  VA"  oxyBDx  is  typically  less  general  than  VA\  BDx  <=  BGX, 
and  thus  K  may  follow  from  the  latter,  but  not  from  the  former.  But  this  is,  of  course,  the 
whole  purpose  of  rule:  to  focus  search  by  making  use  of  a  selected  consequence  (VA"  V^V.  axyBDx ) 
of  the  general  assumption  (VA.  BDX  <=  BGX),  which,  by  itself,  may  be  too  powerful  to  be 
computationally  useful. 

5.3.4  Implementation  Issues 

AProlog  constraints.  In  our  discussion,  we  have  thus  far  ignored  a  problem  posed  by 
AProlog’s  higher-order  nature:  higher-order  variables,  in  addition  to  being  instantiated,  can 
accumulate  constraints  in  the  course  of  computation.  These  constraints  are  essential  for 
higher-order  unification,  and  have  to  be  represented  in  the  forms  manipulated  by  rule.  The 
actual  form  for  rule’s  assumption  is  V3>.  aXyBDx  4=  3 Z.Cz,  where  Cz  is  the  current  set  of 
constraints,  and  Z  represents  all  variables  occurring  only  in  Cz-  A  similar  solution  for  assert 
has  been  proposed  for  the  more  general  constraint  logic  programming  language  CLP(3J)12 
in  [61]:  the  constraints  are  therein  reduced,  to  as  great  an  extent  as  possible  to  the  variables 
occurring  in  in  the  clause  to  be  assumed,  and  then  added  as  ‘guards’  to  that  clause.  (See 
also  §9.2.2.) 


12CLP(8i)  supports  more  general  constraints  such  as  those  expressed  within  arithmetic  inequalities  [70]. 
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rtp  <=  rule  (infer  R) 

(VR.  infer  R  =>  clause  R) 

(R  =  false;  rtp). 

infer  R!  <=  clause  P,  clause  Q,  resolve  P  Q  R,  simpl  R  R',  (R'  =  false;  keep?  R'). 

resolve  (P  ;  Q)  S  (P  ;  R)  <=  resolve  Q  S  R. 

resolve  (P  \Q)  S  (Q  ;  R)  <=  resolve  P  S  R. 

resolve  S  (P  ;  Q)  (P  ;  R)  <=  resolve  Q  S  R. 

resolve  S  (P  ;  Q)  (Q  ;  R)  <=  resolve  P  S  R. 

resolve  P  (not  P)  false, 

resolve  (not  P)  P  false. 

keep?  R  <£=  write  R ,  write_string  “Keep?  :  ”,  read  AG.  G. 


Figure  5.1:  Rudimentary  Resolution  Theorem  Prover. 


5.3.5  Example:  “lemma” 

Versions  of  our  scoped  lemma  introduced  in  §5.2.2  may  now  be  defined  in  terms  of  rule  for 
both  the  non-committing  case: 

lemma  E  K  <=  rule  E  (VP.  E  =>  E)  K. 

and  the  committing: 

lemma  E  K  <=  rule  E  (VP.  P  =>  (P  <=  !))  K. 


5.3.6  Example:  Resolution 

Consider  rtp,  a  rudimentary  resolution  theorem  prover,  given  in  Figure  5.1.  The  predicate 
clause  enumerates  disjunctive  expressions  to  be  resolved,  such  as 

clause  (p  x  y;  not  (q  y  x)). 
clause  (q  a  z). 

resolve  blindly  resolves  its  first  two  arguments,  yielding  a  resolvent  R ,  which  is  then  sim¬ 
plified  by  simpl  (whose  clauses  may  be  found  in  Appendix  A.l).  To  illustrate, 

?  —  resolve  (p  x  y  ;  not  (q  y  x))  (q  a  z)  R ,  simpl  R  R' . 

instantiates  R'  =  p  x  a.  To  avoid  infinitely  re-deriving  the  same  clause,  the  user  is  queried 
by  the  predicate  keep?  to  determine  whether  R'  should  be  used  or  discarded,  rtp  succeeds 
if  it  is  able  to  derive  a  contradiction  ( R '  =  false),  rtp  first  invokes  infer,  which  produces 
a  resolvent  of  two  clauses.  If  either  R!  =  false  or  keep?  R!  succeeds  (t.e.,  the  user  enters 
true),  infer  R  succeeds,  rtp  then  makes  the  forward  step  infer  R  =»  clause  R,  and  as¬ 
sumes  the  universal  closure  of  clause  R  before  recursively  calling  rtp.  (A  more  complete 
implementation  of  rtp  may  be  found  in  Appendix  A. 2.) 
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typeof  (if  EFH)  A 
typeof  (lam  F)  (A 
typeof  (appl  F  E)  B 
typeof  (let  E  F)  B 
typeof  (fix  F)  A 


<t=  typeof  E  bool,  typeof  F  A ,  typeof  H  A. 
B )  <=  Vx.  typeof  a:  A  =>  typeof  ( Fx )  5. 

<=  typeof  F  (A  — ►  i?),  typeof  £  A. 

-4=  typeof  £  A,  typeof  ( FE )  f?. 

<=  Vx.  typeof  x  A  =i>  typeof  (Fx)  A. 


Figure  5.2:  ML  Type  Inference 


5.3.7  Example:  ML-style  Type  Inference 

As  a  further  application  of  rule,  consider  the  example  of  programming  ML-style  type 
inference,13  as  implemented  by  Hannan  &  Miller  [58].  Some  of  the  more  interesting  rules  for 
type  inference  are  included  within  Figure  5.2.  We  are  particularly  interested  in  type  infer¬ 
ence  over  the  ML  let  construct.  We  represent  (let  x  =  E  in  Fx)  within  AProlog  as  let  E  F, 
where  F  is  a  A-abstraction.  (This  reverses  the  order  of  arguments  used  within  Hannan  & 
Miller’s  representation.)  Type  inference  for  this  construct  can  be  captured  by  the  following 
AProlog  clause  [58]: 

typeof  (let  E  F)  B  <=  typeof  E  A,  typeof  (FE)  B. 

The  problem  with  the  above  formulation  is  that  the  type  of  E  is  computed  once  (to  insure 
that  it  is  indeed  typable),  and  then  thrown  away.  Instances  of  E  are  then  re-typed  at 
each  occurrence  of  ®  within  A x.Fx.  This  is  necessary  because  the  type  of  E,  namely  A , 
could  be  polymorphic  —  i.e.,  contain  variables  such  as  the  C  — ►  C  typing  of  the  identity 
Ax.x.  Without  this  re-computation,  a  polymorphic  E  can  only  be  assigned  one  typing  (e.g., 
int  — ►  int),  since  in  the  course  of  matching  that  type,  the  logical  variable  C  would  be 
instantiated  to  int,  thus  preventing  it  from  matching,  say,  bool  — ►  bool  later. 

Now  consider  another  formulation 

typeof  (let  E  F)  B  <=  Vx.  typeof  E  A, 

(VA.  typeof  x  A  <=  typeof  E  A)  =>  typeof  (F  x)  B. 

As  in  the  previous  encoding,  the  initial  typeof  E  A  insures  that  E  has  a  valid  typing  (which 
is  necessary  in  the  case  that  the  argument  x  does  not  occur  in  the  body  F).  Now,  however, 
rather  than  type  FE ,  we  type  Fx  using  the  additional  rule 

VA.  typeof  x  A  <=  typeof  E  A. 


13ML,  a  polymorphic  programming  language,  is  introduced  within  [89]  and  standardized  within  [90].  ML- 
style  type  inference  is  akin  to  type  inference  over  the  simply-typed  A-calculus  of  §3.2,  except  that  ML  includes 
the  let  construct  (discussed  below). 
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This  version  simply  separates  the  re-computation  of  E's  type  from  that  of  typing  F.  Just  as 
before,  different  occurrences  of  x  may  be  given  different  types,  and,  just  as  before,  the  type 
of  x  (and  hence  the  type  of  E)  is  re-computed  from  scratch  at  every  occurrence. 

Once  the  re-computation  has  been  separated,  however,  it  can  be  avoided  entirely  using  the 
universal  generalization  and  limited  amount  of  forward  reasoning  afforded  by  rule: 

typeof  (let  E  F)  B  •$=  Vx.  rule  (typeof  E  A) 

(VA.  typeof  typeof  E  A) 

(typeof  ( Fx )  B). 

This  makes  an  assumption  of  the  form  Vy. typeof  x  A  while  inferring  the  type  of  the  body 
Fx.  y  includes  exactly  those  type  variables  in  A  which  are  not  free  in  any  assumption,  thus 
directly  expressing  the  restriction  on  the  type  inference  rule  for  let.  We  do  not  lose  any  so¬ 
lutions,  since  ML  has  the  principal  type  property,  and  therefore  all  solutions  to  typeof  E  A! 
are  instances  of  the  assumption  VJ^.typeof  E  A. 


5.4  Explanation-Based  Learning  (EBL) 

The  rule  construct  has  allowed  us  to  write  programs  which  could  not  be  straightforwardly 
expressed  in  AProlog,  such  as  the  resolution  theorem  prover,  as  well  as  allowed  us  to  formulate 
programs  more  efficiently,  such  as  type  inference  for  ML.  Moreover,  the  ideas  behind  rule 
carry  over  to  the  problem  of  explanation-based  generalization  and  learning,  which  is  the 
topic  of  this  section. 

Assimilation  bridges  the  gap  between  explanation-based  generalization  and  explanation- 
based  learning ,  where  the  latter  additionally  requires  a  means  for  incorporating  generaliza¬ 
tions  within  the  logic  program.  The  programmer  controls  EBG  via  extensions  of  lemma  and 
rule  —  lemma_ebg  and  rule_ebg,  which  behave  analogously  except  that  their  assumptions 
are  instead  explanation-based  generalizations.  And  as  before,  lemma_ebg  will  turn  out  to 
be  a  special  case  of  its  more  general  counterpart,  rule.ebg. 

The  following  illustrates  how  the  explanation-based  generalizations  of  Chapter  4  could  be 
derived  and  then  assumed  in  the  scope  of  some  further  computation  K:  for  the  suicide 
example  of  §4.2,  the  solution  is 

?—  lemma-ebg  (kill  john  john)  K. 

and  for  the  symbolic  integration  problem  of  §4.4, 

?—  lemma-ebg  (intgr  (Ax.3  *  x2  +  cosx)  h)  K. 
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5.4.1  The  “rule_ebg”  Construct 

Rather  than  first  considering  lemma_ebg,  we  move  directly  to  the  general  case,  rule_ebg. 
The  general  form  for  rule.ebg  is 

rme_ebg  G  (□  VA.  Gx  =>  Dx)  K 

The  formulation  of  rule_ebg  depends  upon  the  □  operator  introduced  in  §4.3:  recall  that 
the  necessary  truth  of  rules  derived  through  EBG  is  captured  with  the  □  prefix.  To  insure 
the  modal  validity  of  rule_ebg’s  assumption,  we  require  that  the  forward  inference  step  also 
be  necessarily  true.14 


Operational  reading.  rule_ebg’s  operational  interpretation  is  as  follows: 

1.  Solve  V  hg  G  with  EBG  enabled.  If  this  fails,  backtrack.  Otherwise,  it  yields  some 
explanation-based  generalization 

□  Vy.  OGGy  <=  ODDy. 

where  GGy  is  the  generalized  query,  DDy  captures  the  preconditions  of  the  general¬ 
ization  (the  choice  of  the  symbol  ‘ DD ’  will  be  motivated  within  §8.6),  y  may  appear 
free  within  GGy  and  DDy ,  and  OGGy  necessarily  has  OG  as  an  instance.  The  latter 
is  a  consequence  of  the  EBG  algorithm  itself:  the  original  query  must  be  an  instance 
of  the  generalized  query  (§8.6). 

2.  Find  minimal  substitutions  ax  and  ay  such  that  dom(cr(y)  C  (A"),  dom(try)  C  (3>),  and 
OxOGx  =  ayOGGy. 

3.  Let  X'  =  X  —  dom(<T/i'). 

4.  Let  y  =  y  —  dom(<7y). 

5.  Solve 


{p'iX,.Vy,.axoye(Dx  <f=  DDy)}  u  OV  OK. 


(As  with  rule,  we  may  again  rely  upon  AProlog  unification  to  derive  .a  more  logic  programming- 
oriented  interpretation  of  rule.ebg 

1.  Solve  G  with  EBG  enabled,  resulting  in  the  explanation- based  generalization 
□  Vy.OGGy  <=  ODDy. 


14We  do  not  use  ! !  in  place  of  □  V  for  rule^ebg’s  forward  inference  step  as  the  former  is  only  permitted 
at  the  top-level  of  program  clauses. 
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2.  Create  new  logical  variables  X  for  each  universal  variable  in  X ,  and  then  substitute 
X  for  X  in  Gx  and  Dx,  yielding  G#  and  D 

3.  Create  new  logical  variables  y  for  each  universal  variable  in  y,  and  then  substitute  y 
for  y  in  GOy  and  DDy,  yielding  GGy  and  DDy. 

4.  Unify  GGy  and  G $• 

5.  Solve  (□  Vy.  VX.  D*  <=  DDy)  =j>  K.) 


Declarative  reading.  While  rule.ebg  differs  from  rule  in  the  obvious  way,  the  same 
declarative  reading  is  applicable  to  both!  This  is  because  in  the  same  sense  as  rule,  rule.ebg 
does  not  permit  assumptions  not  derivable  from  the  logic  program.  The  proof  relies  upon  the 
validity  of  our  EBG  algorithm  (established  in  §8.6),  and  then  employs  techniques  analogous 
to  those  used  in  the  proof  for  rule  (§5.3.3).  We  omit  this  proof  as  it  requires  deriving 
formal  soundness  properties  under  an  inference  system  extended  with  □  (addressed  in  §8.3) 
and  with  EBG  (not  addressed),  but  we  do  provide  a  formal  characterization  of  AD Prolog 
generalization  in  the  form  of  an  abstract  interpreter  in  §8.6. 

As  is  the  case  with  rule,  rule_ebg  must  also  take  AProlog  constraints  into  account.  The 
situation  is  handled  analogously.  (See  also  §9.2.2.) 

5.4.2  Example:  “lemma_ebg” 

We  may  now  define  lemma_ebg  in  terms  of  rule_ebg: 

lemma-ebg  E  K  •$=  rule_ebg  E  (□  VE.  E  =>■  E)  K. 

Similarly,  a  committing  version  may  be  defined  as 

lemma-ebg  £  K  4=  rule_ebg  E  (□  Vj E.E  =>  (E  <=  !))  K. 


Additional  illustrations  of  rule.ebg  appear  in  the  remaining  chapters. 
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Chapter  6 


Search  Control  via  Tactics  and 
Programmable  Learning 


The  integration  example  we  presented  within  §4.4  relied  upon  logic  programming’s  implicit 
search  to  solve  queries.  Additional  levels  of  search  control  need  not,  however,  interfere 
with  the  underlying  process  of  EBG!  We  demonstrate  this  by  implementing  a  tactic-based 
solution  of  the  symbolic  integration  problem.  Search  is  controlled  within  a  tactic-based 
theorem  prover  (or  problem  solver)  by  requiring  the  user  to  a  priori  or  interactively  specify  a 
combination  of  proof  steps,  or  tactics ,  with  which  to  attempt  the  derivation  of  a  goal  [50,  20]. 
This  combination  of  tactics  guides  the  construction  of  an  actual  proof  (or  problem  solution). 


6.1  Example:  Tactic- Style  Symbolic  Integration 


Once  again,  our  presentation  herein  focuses  upon  the  most  relevant  and  interesting  aspects 
of  the  example;  the  unabridged  tactic-based  problem  solver  may  be  found  in  Appendix  A. 5. 

Tactics  are  simply  named  rules:  for  the  integration  domain,  we  have 


! !  tac 

constant 

(intgr 

true. 

(Aar. a) 

(Ax.o  *  x)) 

! !  tac 

power 

(intgr 

true. 

( Ax.x0 ) 

(Ax.x“+V(a+1))) 

! !  tac 

constant  Jeft 

(intgr 

(Ax. a  *  fx) 

(Ax.a  *  fx)) 

(intgr 

f 

/')• 

!!  tac 

plus 

(intgr 

(A  x.fx  +  hx) 

(A  x.fx  -|-  h'x)) 

(intgr 

f 

f , 

intgr 

ft 

ft'). 

tac 

cos -tac 

(intgr 

true. 

cos 

sin) 

Tactics  perform  goal  reduction:  the 

input  goal  Gm 

(2nd  argument)  is  reduced  to  a  more 

easily  solved  subgoal  Gout  (3rd  argument). 
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of  tactics,  we  have  problem  independent  meta-tactics,  or  tacticals, 


Gjn 

Gin 

Gi„. 

Gout 

<= 

tac  T\  Gjn  Gmed,  tac  T2  Gmecj  Gout- 

Gin 

Gout 

<= 

tac  Ti  Gin  Gouti  tac  T2  Gm  Gout- 

Gin 

Gout 

<= 

tac  (orelse  T  idtac)  Gm  Gou t- 

Gin 

true 

tac  T  Gin  true. 

Gin 

G  out 

<= 

tac  (orelse  (then  T  (repeat  T))  idtac)  G,n  G, 

To  represent  compositions 
such  as 

! !  tac  idtac 
! !  tac  (then  T\  Ti) 

! !  tac  (orelse  Tx  T-i) 

! !  tac  (try  T) 

! !  tac  (complete  T) 

! !  tac  (repeat  T) 

Tacticals  are  applied  to  compound  goals  (i.e. 
maptac: 


tac 

(maptac  T) 

true 

true. 

tac 

(maptac  T) 

(Ginl  ,  Gin2) 

Gout 

tac 

(maptac  T) 

(Ginl  i  Gin2) 

Gout 

tac 

(maptac  T) 

Gin 

Gout 

those  containing  logical  connectives)  via 


<=  !,  tac  (maptac  T)  Gjnl  Gouti , 
tac  (maptac  T)  Gm i  Gout2, 
simpl  (Gouti  i  Gout2)  G0ut- 
<=  !,  tac  (maptac  T)  Gm\  Gouti , 
tac  (maptac  T)  Gita  Gout2, 
simpl  (Gouti  »  G0ut2)  Gout. 

^  t&C  T  Gin  Gmed) 

simpl  Gmca  Gout- 


where  the  clauses  for  simpl  may  be  found  in  Appendix  A.l.  (The  above  tacticals  were  to  a 
large  degree  borrowed  from  Felty  [44,  pp.143-149].) 

We  augment  the  above  with  a  special  interactive  tactical: 


! !  tac  interactive  Gin  Gout  <=  write .string  “Goal  to  be  reduced  :  ",  write  Gjn, 

newline,  write_string  “Enter  tactic/tactical :  ", 
read  XT.  tac  T  Gm  Gme a,  ((Gmed  =  true,  Gout  =  true) 

;  tac  interactive  Gmea  Gout)- 


Now  to  solve  the  query 

?—  tac  interactive  (intgr  (Ar.2  *  (3  *  cos  i))  h ) 

Gout- 

we  could  enter  the  series  of  tactics  constant  Jeft,  constant  Jeft,  and  cos.tac  as  prompted; 
or  equally,  the  tactical 

then  (repeat  constant  Jeft)  cos -tac 

yielding 

H  =  Xx.2  *  (3  *  sin  x) 

Gout  =  true 
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Entering  the  same  series  of  tactics  or  the  same  tactical  to  the  query 

?—  lemma-ebg  (tac  interactive  (intgr  (Az.2  *  (3  *  cosz))  h) 

Gout) 


top. 

leads  to  the  assimilation  of  the  explanation-based  generalization 

!!  tac  interactive  (intgr  (A x.a*(b*fx))  {Xx.a  *  (b  *  f'x))) 

(intgr  f  /')• 

(In  §6.3  we  discuss  why  it  is  not  desirable  for  interactive  to  appear  within  the  generaliza¬ 
tion.) 


As  a  somewhat  more  complex  illustration,  the  query 

?—  lemma-ebg  (tac  interactive  (intgr  (Xx.2  +  (3  *  z2))  h ) 

Gout) 

top. 

when  solved,  for  example,  by  the  tactical 

then  plus  (maptac  (orelse  constant  (then  constant  Jeft  power))) 
assimilates  the  generalization 

!!  tac  interactive  (intgr  {Xx.a  +  {b  *  xc))  (Az.(a  *  x)  +  (6  *  xc+1/(c  +  1)))) 

true. 


6.2  Level  of  Generalization 

As  one  would  expect,  the  above  explanation-based  generalizations  are  applicable  to  problems 
not  addressed  by  the  original  tactical: 

Xx.2  *  (3  *  sin  x) 

Xx.a  +  (y*  (3*  x )) 

This  is,  of  course,  because  tactics  of  the  training  theory  are  abstracted  (as  well  as  constants 
of  the  original  goal). 

At  the  same  time,  the  derived  rules  do  not  cover  the  range  of  problems  for  which  the  given 
tacticals  are  applicable:  consider  that  the  first  tactical 

then  (repeat  constant -left)  cos -tac 
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solves  each  of  the  integrals 

Aar.  cos  x 
Ax. 4  *  cosx 
Ax.2  *  (3  *  cosx) 

Ax.a  *  (y  *  (3  *  cosx)) 

and  so  on. 

Because  the  tactical-  or  meta-level  is  formulated  completely  within  T>,  generalization  does 
not  occur  at  that  level;  instead  generalization  is  confined  to  the  tactic-  or  rule-level.  This  is, 
of  course,  exactly  what  we  were  after  when  we  set  out  to  make  the  additional  level  of  search 
control  transparent  from  the  perspective  of  EBG.  Alternative  formulations  could  produce 
generalizations  at  the  tactical-level,  but  those  derived  rules  are  more  likely  to  be  so  general 
that  they  would  be  difficult  to  apply.  [37]. 


6.3  Level  of  Assimilation 

As  discussed  within  Chapter  5,  under  the  traditional  approach  to  learning,  the  problem  solver 
produces  and  assimilates  generalizations  in  the  course  of  solving  queries.  Such  an  approach 
to  assimilation  is,  however,  problematic  for  tactic-based  paradigms.  In  the  above  example, 
although  generalization  occurs  only  at  the  level  of  tactics ,  the  derived  rule  nevertheless  con¬ 
tains  a  reference  to  the  tactical  interactive.  If  we  are  to  maintain  a  strict  separation  of  the 
rule-level  and  meta-level,  it  does  not  make  sense  to  assimilate  a  generalization  encompassing 
both  levels.  Rather,  a  slightly  modified  generalization  could  be  assimilated  at  the  rule-level 
as  a  derived  tactic: 

!!  tac  constant  Jeft_two  (intgr  (A x.a*(b*fx))  (A x.a*  (b*  f'x))) 

-(intgr  /  /')• 

Moreover,  this  assimilation  of  a  derived  tactic  can  be  achieved  through  the  limited  forward 
reasoning  provided  by  rule.ebg: 

?—  rule_ebg  (tac  interactive  (intgr  (Ax.2  *  (3  *  cosx))  h )  (?0ut) 

VGjn  VGout-  tac  interactive  Gjn  Gout  =>  tac  constant  Jeft.two  Gm  Gou t 
top. 

The  point  here,  and  it  is  an  important  one,  is  that  it  is  the  user  (or  client  program),  rather 
than  the  problem  solver,  which  is  in  a  position  to  control  assimilation  in  this  situation.  If  we 
were  to  instead  directly  assimilate  the  original  generalization,  we  compromise  the  predicate 
interactive  in  that  a  subsequent  invocation  might  no  longer  prompt  the  user;  that  is,  we 
compromise  the  user’s  control  over  search. 

This  example  reinforces  our  belief  that  for  such  applications  EBG  should  be  a  feature  of 
the  language  in  which  problem  solvers  are  coded,  rather  than  a  ‘black  box’  within  the 
problem  solving  architecture.  In  other  words,  what  is  required  is  a  language  in  which  one 
can  program  the  learning  mechanism.  By  providing  the  programmer  with  an  explicit  means 
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to  control  generalization  and  assimilation,  we  defer  the  difficult  problem  of  determining  when 
to  generalize  and  assimilate  [72].  Client  programs  have  the  potential  advantage  of  bringing 
domain  knowledge  and  user  interaction  to  bear  in  determining  what  is  to  be  learned.  This 
concept  of  programming  generalization  and  learning  within  the  same  language  in  which 
problem  solving  and  interaction  occur  is  markedly  different  from  what  we  label  ‘black-box’ 
learning.  Hence  our  approach  stands  in  contrast  to  systems  such  as  Soar  [75],  Prodigy  [92], 
and  LEAP  [96]  in  which  learning  is  largely  relegated  to  the  system. 


6.4  Operationally  vs.  □  —  Revisited 

While  §4.5  illustrated  that  both  □  and  operationally  criteria  serve  to  define  EBG’s  gener¬ 
alized  proofs  (and  hence  its  results),  the  tactic  example  above  demonstrates  that  the  mech¬ 
anisms  are  not  interchangeable:  consider  that  a  formulation  of  the  integration  domain  that 
replaces  □  with  operationality  criteria  (defined  via  the  predicate  oper)  requires  specifying 

oper  (tac  interactive  (intgr  cos  sin)  true). 

The  problem  is  again  that  this  definition  forces  the  mixing  of  the  rule-  and  meta-level, 
thereby  violating  the  modularity  of  our  encoding. 


6.5  An  EBG  Tactical 

As  presented  within  §6.3,  the  following  query  represents  a  way  to  perform  EBG  over  inter¬ 
active  tactic-based  problem  solving: 

?—  rule_ebg  (tac  interactive  (intgr  (Ax. 2  *  (3  *  cosx))  h)  GolU) 

(VGin  VGout-  tac  interactive  Gm  Gout  =>  tac  constant  Jeft -two  Gi„  Gout) 
top. 

EBG  need  not,  however,  be  separated  from  the  meta-level:  consider  the  special  generalization 
tactical 

! !  tac  (ebg-tac  Tac)  Gm  Gou t  <=  rule_ebg  (tac  interactive  Gm  Gme d) 

(VGin  VGmed.  tac  interactive  Gin  Gmc d 
=>  tac  Tac  Gm  Gmed) 

(tac  interactive  Gme d  Gout) 

At  any  point  in  the  interactive  solution  of  a  goal  Gj„,  the  user  may  initiate  EBG  via 
ebg-tac,  which  takes  the  name  Tac  of  the  tactic  to  be  derived  as  an  argument  {e.g., 
constant  Jeft-two).  rule.ebg,  in  turn,  recursively  invokes  interactive  to  reduce  the  ini¬ 
tial  goal  Gi„  to  some  other  goal  Gme d-  When  this  nested  invocation  returns  (or  ‘pops’), 
which  would  result  from  the  user  entering  idtac  as  prompted,  rule.ebg  uses  the  resulting 
explanation-based  generalization  to  derive  a  new  tactic  Tac.  As  a  result  of  the  forward 
chaining  step,  this  newly  derived  tactic  is  assimilated,  and  thus  made  available  (at  the  user’s 
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request)  for  application  within  the  solution  of  Gmed.  Moreover,  if  we  want  to  retain  the 
result  of  EBG  after  solving  Gme a,  simply  replace  the  third  argument  of  rule_ebg  with 

tac  interactive  Gmed  Gout)  top 

(A  further  illustration  of  the  EBG  tactical  may  be  found  in  Appendix  A. 6.) 


83 


Chapter  7 

Program  Transformation  and 
Apprentice  Learning 


As  introduced  within  §2.1,  one  paradigm  for  formal  program  development  is  that  of  program 
transformation  [13,  68,  42,  102].  Under  a  transformational  approach,  an  abstract  specifica¬ 
tion  of  an  algorithm  is  refined,  or  specialized ,  through  a  sequence  of  formal  elaboration  steps, 
or  transformations ,  into  a  program  with  acceptable  performance.  The  resulting  sequence  of 
transformations,  or  meta-program,  along  with  the  initial  specification  serve  as  a  derivation , 
or  justification,  of  the  optimized  program. 

In  that  they  encode  named  incremental  problem  solving  steps  subject  to  composition,  pro¬ 
gram  transformations  are  akin  to  tactics.  The  difference  is,  of  course,  that  transformations 
operate  on  programs  (or  subexpressions  of  programs)  rather  than  upon  goals.  Also,  tac¬ 
tics  embody  theorem  proving  steps,  which  are  generally  directional  (reducing  goals  to  more 
easily  solved  subgoals),  while  there  is  typically  no  clear  directionality  to  transformations. 
Typically,  transformations  map  one  program  to  a  functionally  equivalent  version  that  may 
have  different  performance  characteristics. 

7.1  Example:  Tail  Recursion 

We  illustrate  EBG  over  a  transformation  system  we  have  applied  to  induce  tail  recursion  in 
certain  situations.1  (From  a  tail  recursive  version,  an  iterative  form  could  easily  be  derived.) 

We  begin  with  a  functional  specification  of  the  factorial  program: 

fix  A fact,  lam  An.  if  (equals  n  0) 

1 

(appl  fact  (n  -  1))  *  n 


lThis  example  is  treated  more  abstractly  within  Dietzen  tc  Scherlis  [33],  among  others. 
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The  above  is  a  AProlog  abstract  syntax  for  a  simple  functional  language.  The  constructs 
lam  and  appl  represent  object-level  A-abstraction  and  application,  respectively.  (The 
incorporation  of  explicit  notation  allows  us  to  distinguish  meta-  and  object-level.  This 
provides  programmer  control  over  operations  such  as  /3-reduction:  that  is,  we  can  write 
(appl  (lam  Ax.  x)  1)  without  AProlog  performing  the  reduction,  as  it  would  for  the  di¬ 
rect  representation  ((Ax.  x)l)  =>p  1.  /3-reduction  is  then  handled  explicitly  by  replacing 
(appl  (lam  /)  x)  with  fx.)  Finally,  the  fixpoint  or  recursion  operator  fix  is  ‘applied’  by 
substituting  its  body  for  each  occurrence  of  the  bound  identifier  within  its  body. 

The  derivation  proceeds  by  applying  transformations  to  this  specification.  For  example,  the 
following  transformation  replaces  an  occurrence  of  e  with  op  e  z,  where  z  is  a  right  identity 
of  op  (for  example,  mapping  a  to  a  +  0): 

!!  add -id -right  op  C  (C  e)  (C  (op  e  z))  <=  right  Jdentity  op  z. 

The  third  and  forth  arguments  match  the  input  and  output  object  programs,  respectively. 
The  second  argument  C  specifies  a  context  —  i.e.,  the  particular  subexpression  of  the  input 
program  to  be  transformed.  These  higher-order  context  variables  serve  to  formally  encode 
subterm  or  occurrence  selection,  which  might,  for  example,  result  from  “pointing  with  a 
mouse”  [106].  This  represents  yet  another  application  of  higher-order  representation  lan¬ 
guage:  the  formal  expression  of  occurrences.  For  example,  within  the  following  invocation 
of  the  transformation 

?  -  add-id-right  (Xx.Xy.  x  +  y)  (A g.g  *  h)  (a  *  b)  Fout- 

the  context  variable  is  C  =  A g.  g  *  h.  From  the  definition  of  add _id_right  above,  C  is 
applied  to  e  and  then  matched  against  the  input  a  *  b:  that  is, 

C  e  =  (A^.0  *h)e 

-0  e*h 
=  a  *  b 

Thus,  e  is  instantiated  to  a  and  h  to  b.  Now,  given  that 

right  Jdentity  (Xx.Xy.  x  +  y)  0 

the  output  Fout  is  instantiated  as  follows: 

Font  =  C  (ope  z) 

=  (Xg.  g  *b)  ((Xx.Xy  i  +  y)aO) 

=0  (Xg.g*b)(ei  +  Q) 

=g  (a  +  0)  *  b 

The  full  derivation,  which  consists  of  a  sequence  of  ten  such  transformation  rules  and  the 
associated  contexts,  constitutes  a  meta- program  —  i.e.,  a  program  that  manipulates  an 
object  program  such  as  fact.  Like  tacticals,  meta-programs  may  be  specified  a  priori ,  or 
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constructed  interactively.  Ideally,  interactive  construction  of  meta-programs  consists  of  al¬ 
ternatively  naming  a  transformation  rule,  and  then  selecting  an  appropriate  context.  And 
ideally,  context  selection  would  be  derived  by  translating  mouse  input  into  the  appropriate 
higher-order  context.  However,  as  eLP  currently  lacks  the  necessary  interface  to  inter¬ 
pret  mouse  events,  contexts  have  herein  been  hand-coded.  We  first  describe  an  a  priori 
meta-program  tail_rec  embodying  this  derivation.  We  later  provide,  in  Appendix  A. 6,  an 
implementation  supporting  the  interactive  construction  of  a  meta-program  equivalent  to 
tail-rec. 

7.1.1  Derivation 

We  now  enumerate  the  individual  steps  of  the  tail-recursive  fact  derivation.  Our  discus¬ 
sion  will  focus  upon  the  abstract  nature  of  the  transformations,  rather  than  upon  the 
low-level  details  of  transformation  application  itself;  the  latter  was  treated  to  some  ex¬ 
tent  for  add  jd_right  above.  After  grasping  this  abstract  description  of  the  derivation, 
the  reader  may  then  want  to  review  the  A°Prolog  representation  of  these  transformations 
(Figure  7.1),  and  their  application  with  the  appropriate  contexts  via  the  meta-program 
tail-rec  (Figure  7.2).  However,  for  many  readers  the  intimate  details  of  both  the  abstract 
derivation  and  its  AaProlog  counterpart  may  prove  too  tedious  to  be  of  interest.  Indeed, 
there  is  nothing  new  in  these  transformations,  except,  to  some  degree,  their  representation 
within  higher-order  language.  (The  case  for  using  higher-order  language  to  represent  pro¬ 
gram  transformations  is  argued  by  Huet  Sc  Lang  [68],  and  more  recently  by  Pfenning  Sc 
Elliott  [106],  and  Hannan  Sc  Miller  [59].)  It  is  the  application  of  higher-order  EBG  to  the 
whole  process  which  is  our  contribution.  For  those  readers  more  interested  in  the  latter,  I 
suggest  you  skip  ahead  to  the  discussion  of  §7.1.2. 

0.  We  begin  with  the  initial  definition  of  fact. 

fix  A fact,  lam  An.  if  (equals  n  0) 

1 

(appl  fact  (n  -  l))*n 

1.  ^-expand  term  in  the  object-language;  that  is,  insert  a  lam  and  an  appl.  (‘...’  elides 
the  body  of  fact.) 

lam  An.  appl  (fix  A  fact. ...) 
n 

(The  above  n  is  distinct  from  that  within  fact's  body.) 

2.  Insert  a  multiplication  by  1.  This  transformation  relies  upon  right  Jdentity  (Ax.  Ay.  x  *  y)  1. 
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lam  An.  (appl  (fix  A fact. ...) 
n)  *  1 

3.  Abstract  over  1;  that  is,  make  it  a  parameter.  This  introduces  a  second  argument 
which  is  to  become  the  accumulator  within  the  eventual  tail  recursive  version. 


appl  (lam  Am.  lam  An. 

(appl  (fix  A fact. ...) 
n)  *  m) 


1 


4.  Name  the  resulting  two  argument  function  factl :  since  fix  specifies  the  expansion  of 
recursive  functions,  one  may  think  of  it  as  a  mechanism  for  function  definition.  This 
initial  definition  of  factx  will  be  used  later  in  the  derivation. 


appl  (fix  A factx.  lam  Am.  lam  An. 
(appl  (fix  A fact.  ...) 
n)  *  m) 


1 


5.  Unfold  the  recursive  definition  of  fact ;  that  is,  expand  the  fixpoint  operator  once. 


appl  (fix  A factx.  lam  Am.  lam  An. 

(appl  (lam  An',  if  (equals  n'  0) 

1 

((appl  (fix  A fact.  ...)  n'  —  1)  *  n')) 

n)  *  m) 

1 


6.  /3-reduction  in  the  object-language;  that  is,  (appl  (lam  An',  fn')  n)  fn. 

appl  (fix  A factx.  lam  Am.  lam  An. 

(if  (equals  n  0) 

1 

((appl  (fix  A fact.  ...  )  n-  l)*n)) 

*  m) 


7.  Distribute  *  over  if. 


appl  (fix  A factx.  lam  Am.  lam  An. 
if  (equals  n  0) 

1  *  m 

((appl  (fix  A  fact.  ...)  n  -  1)  *  n)  *  m) 
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8.  Simplify  the  then-clause  using  the  fact  that  leftJdentity  (Ax. Ay.  x  *  y)  1. 


appl  (fix  A factt.  lam  Am.  lam  An. 
if  (equals  n  0) 


m 


((appl  (fix  A fact.  ...)  n  -  1)  *  n)  *  m) 


9.  Re-associate  the  multiplicative  expression  of  the  e/se-clause,  since  associative  Ax. Ay.  x  * 


appl  (fix  A fact1.  lam  Am.  lam  An. 
if  (equals  n  0) 
m 

(appl  (fix  A fact.  ...)  n  -  1)  *  (n  *  m)) 

1 

10.  Observe  that  within  step  9  the  subexpression 
(appl  (fix  A fact.  ...)  n  -  1)  *  (n  *  m)) 

is  a  higher-order  instance  of  the  original  definition  of  fact1  given  in  step  4: 

fix  A factl.  lam  Am.  lam  An. 

(appl  (fix  A fact. ...) 
n)  *  m) 


The  only  difference  is  the  values  of  the  arguments  m  and  n.  This  means  that  we  may 
fold  the  above  expression  into  a  fact j  invocation. 


appl  (fix  A fact1.  lam  Am.  lam  An.  if  equals  n  0 

m 

(appl  (appl  factl  (n*m))  (n  —  1))) 

1 


This  completes  the  derivation. 


As  mentioned  above,  each  of  the  preceding  transformation  steps  is  formally  represented 
in  Figure  7.1.  While  we  do  not  attempt  a  proof,  we  claim  that  these  transformations  are 
correctness  preserving  —  i.e.,  they  do  not  change  the  functionality  of  the  program. 


7.1.2  Generalizing  the  Derivation 

The  tail_rec  meta-program  may  be  applied  to  fact  through  the  query 


?  _ 


tail-rec  (Ax. A y.  x  *  y) 

(fix  A fact,  lam  An.  if  (equals  n  0) 


(appl  fact  (n  -  1))  *  n) 


•  out* 


which  yields  the  tail  recursive  expression 

F0ut  =  appl  (fix  A/act j.  lam  Am.  lam  An.  if  (equals  n  0) 

m 

(appl  (appl  fact1  (n  *  m))  (n  —  1))) 

1 


For  explanation-based  generalization,  we  instead  make  the  query 

?  -  lemma-ebg  (tail_rec  (Ax. Ay.  x  *  y) 

(fix  A fact,  lam  An.  if  (equals  n  0) 
1 


(appl  fact  (n  —  1))  *  n) 


top. 


which  leads  to  the  assimilation  of  the  following  generalization: 

! !  tail-rec  op 

(fix  A /.  lam  Ay.  if  (Hi  y) 
a 

(op  (appl  /  (H2  y))  (H3  y))) 
(appl  (fix  A/',  lam  Ax.  lam  Ay.  if  (Hi  y ) 


(appl  (appl  /'  ( op(H3y)x )) 

(Hiy))) 


b ) 


<=  right  Jdentity  op  b,  left  -identity  op  a,  associative  op. 


The  result  produced  by  our  prototype  is  not  so  elegantly  expressed:  it  consists  instead  of 
a  series  of  constraint  equations.  We  took  the  liberty  of  collapsing  them  into  their  ‘most 
obvious’  solution  above  for  presentation.  The  problem  of  more  elegantly  displaying  these 
constraints  requires  further  consideration;  see  §9.2.2. 

In  either  form,  however,  the  generalization  may  be  applied  to  analogous  programs  such  as 
list  reversal: 
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?  —  tail-rec  append 

(fix  Xrev.  lam  XI.  if  (null  /) 
nil 

(append  (appl  rev  (tl  /))  ((hd  /) ::  nil))) 

■Font. 

which  instantiates2 

a  =  nil 
b  =  nil 
H\  =  null 
H2  =  tl 

H3  =  XI.  hd  / ::  nil 

yielding  the  tail-recursive  version 

Foat  =  appl  (fix  Anetq.  lam  X k.  lam  XL  if  (null  /) 

k 

(appl  (appl  revi  (append  ((hd  /) ::  nil)  k)) 

(tl/)) 

nil 

The  above  result  requires  only  the  addition  of  a  final  simplification  to  make  the  reduction 
from  (append  ((hd  /) ::  nil)  k )  to  ((hd  /)  ::  k).  Hence,  the  generalized  fact  derivation  is 
sufficient  for  rev  as  well  (except  for  the  final  simplification). 

7.2  Expressiveness  of  Higher-order  Generalization 

The  elegance  of  the  preceding  generalization  is  largely  due  to  the  expressiveness  of  our 
higher-order  language.  In  particular,  essential  restrictions  on  the  input  program  are  implicit 
in  the  higher-order  notation:  (1)  that  the  function  argument  y  may  not  appear  in  the  ‘then’ 
part  of  the  if-statement,  (2)  that  the  function  /  may  not  be  recursively  invoked  in  the 
‘conditional’  or  ‘then’  parts  of  the  if,  and  (3)  that  the  recursive  call  to  /  within  the  ‘else’ 
branch  must  be  the  argument  to  a  particular  function  op  having  special  properties.  These 
restrictions  are  not  explicit  in  any  single  transformation  step,  but  rather  are  spread  over 
the  sequence  of  transformations  embodied  by  the  generalization.  Realizing  a  similar  result 
within  a  first-order  system  would  be  complicated  by  the  need  for  these  checks. 

Admittedly,  even  with  the  expressivity  of  higher-order  language,  program  development  by 
transformation  is  a  very  tedious  business.  But  this  is  precisely  why  this  domain  represents 


2 While  the  derivation  never  establishes  that  a  =  b,  this  follows  from  the  fact  that  a  =  (op  a  b)  =  b  using 
right  -identity  op  b  and  left-identity  op  a. 
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an  attractive  application  for  explanation-based  generalization:  by  employing  EBG  to  ab¬ 
stract  derivations,  one  hopes  to  derive  ‘larger-grain’  transformations  —  ‘macro  operators’, 
if  you  will.  Thus,  it  is  our  belief  that  EBG  is  one  means  by  which  to  develop  higher-level 
transformations,  of  which  the  preceding  example  is  an  illustration. 


7.3  Apprentice  Learning 

The  search  space  for  the  above  derivation  is  so  complex  that  without  user  guidance  (e.g., 
via  an  explicit  meta-program,  specified  a  priori  or  interactively),  it  would  not  be  feasible  for 
a  system  to  ‘discover’  the  sequence  of  transformations  and  their  associated  contexts  with 
which  to  induce  tail  recursion.  The  transformation  problem  space  is  further  complicated  by 
the  fact  that  it  is  the  user  who  decides  when  a  derived  program  is  acceptably  ‘efficient’  (in 
this  case,  when  it  is  tail-recursive).  Within  the  transformation  paradigm,  we  are  not  in  the 
situation  of  theorem  proving  where  there  are  only  two  answers  —  “yes,  a  goal  is  provable”  or 
“no,  it  is  not.”  Instead,  the  role  of  the  user  is  two-fold:  to  guide  the  derivation  and  to  make 
value  judgments  upon  the  resulting  programs.  Currently  we  are  so  far  from  automating  the 
latter  that  transformation  systems  will  continue  to  depend  upon  user  assistance. 

That  these  value  judgments  are  not  represented  within  the  transformations  means  they  are 
not  manifest  in  the  resulting  generalizations.  There  is  an  important  underlying  assumption 
here:  namely  that,  a  sequence  of  transformations  which  leads  to  a  ‘good’  program  in  one 
particular  case  (e.g.,  fact)  is  presumed  to  do  the  same  for  other  programs  to  which  it  is 
applicable  ( e.g .,  rev).  However,  as  this  ‘goodness’  exists  outside  of  the  transformations 
themselves,  there  is  no  guarantee  that  a  derived  rule  indeed  yields  a  ‘good’  program.3 

Explanation-based  generalization  is  often  labeled  ‘speed-up’  learning  in  that  EBG  extends 
the  domain  theory  by  constructing  new  rules  in  the  deductive  closure  of  that  domain  theory. 
In  other  words,  under  EBG  nothing  new  may  be  proven,  but  the  solution  of  problems 
covered  by  derived  rules  is  (hopefully)  quicker.  With  the  incorporation  of  user  interaction 
to  address  the  problem  of  intractable  search,  this  characterization  of  EBG  becomes  invalid: 
the  resulting  generalizations,  while  in  the  deductive  closure  of  the  rule  set,  are  generally 
not  accessible  without  user  guidance.  Here  EBG  serves  as  a  vehicle  to  transfer  knowledge 
from  the  user  to  the  learner.  The  combination  of  learner  and  user,  when  viewed  as  a  whole, 
still  only  accomplish  speed-up  learning.  But,  after  a  joint  derivation  of  fact,  the  learner 
could  handle  rev  without  user  assistance  (presuming  that  the  system  could  find  the  final 
simplification).  That  is,  from  the  individual  perspectives  of  the  learner  and  user,  more  than 
speed-up  learning  has  taken  place  [25,  pp. 151-153]  [29,  pp. 304-305]. 

7.4  Other  work 

The  above  is  reminiscent  of  the  learning  apprentice  system  defined  by  Mitchell  [94]  vhich 
LEAP ,  a  learning  apprentice  for  VLSI  design,  is  perhaps  the  best  known  example  [96,  SI]. 

3We  are  grateful  to  Jack  Mostow  for  this  observation  [98]. 
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However,  our  approach  differs  in  that  we  are  not  necessarily  attempting  to  develop  heuristics 
that  make  an  intractable  theory  tractable  [99,  126].  Rather,  the  client  may  simply  intend 
that  the  user’s  role  become  easier  as  derived  generalizations  are  made  available,  while  the 
fundamental  intractability  of  the  domain  remains. 

Hill  also  considers  the  application  of  EBG  to  the  domain  of  program  development  [62].  How¬ 
ever,  Hill’s  research  utilizes  a  first-order  encoding,  and  focuses  upon  a  particular  application 
within  formal  programming:  the  generalization  of  abstract  datatype  representations.  Our 
work  is  directed,  instead,  toward  the  realization  of  a  common  language,  ADProlog,  in  which 
a  multiplicity  of  programming  and  theorem  proving  methodologies  can  be  realized. 

In  contrast  with  the  apprentice  approach  is  that  taken  by  Steier  [123].  Steier  uses  the  Soar 
architecture  (see  §4.7  &  §6.3)  to  develop  a  series  of  algorithm  designers  that  learn  from 
experience.  Unlike  the  work  above,  his  efforts  do  not  focus  on  cooperative  problem  solution; 
rather  the  system  alone  constructs  programs  to  meet  given  criteria  using  its  knowledge-base 
of  design  information.  And  hence  it  is  less  critical  that  the  design  knowledge  within  his 
system  be  easily  comprehended  (which  may,  in  part,  explain  his  success  employing  first- 
order  encodings).  At  the  same  time,  the  programs  his  framework  synthesizes  (e.y.,  sorting 
algorithms)  are  significantly  more  complex  than  anything  to  which  we  have  thus  far  applied 
our  framework,  and  his  learners  exhibit  improved  performance  as  they  encounter  similar 
design  problems. 

Recently,  Hagiya  has  formalized  higher-order  EBG  over  another  higher-order  language  LF  [54] 
LF,  which  stands  for  ‘logical  framework’,  is  a  logic  for  encoding  other  logics  [60].  By  for¬ 
mulating  EBG  over  LF,  Hagiya  realizes  EBG  over  languages  defined  in  LF.  Like  ours,  his 
formulation  is  defined  in  terms  of  higher-order  unification,  but  he  also  extends  the  algorithm 
to  treat  mathematical  induction.  Hagiya  also  uses  LF  and  higher-order  unification  to  explore 
the  derivation  of  programs  and  proofs  by  example  [56].  Previously  Hagiya  has  presented  a 
solution  for  generalizing  programs  (e.g.,  to  operate  on  greater  ranges  of  input  values)  in  the 
proofs- as-programs  framework  using  higher-order  type  theory  [55]. 
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! !  insert  JamC 

( C  fix  A/.  lam  An.  G  f  n) 

(C  lam  An.  appl  (fix  A/,  lam  An'.  G  f  ri)  n). 

! !  add.oper  -right  _id_l  op  C 

( C  A x.Gx) 

( C  Ax.  op  ( G  x)  a) 

<=  right  Jdentity  op  a. 

! !  abstract_arg  op  C\  C2 

(Ci  (C2a)) 

(Ci  (appl  (lam  Am.  C2  m)  a)). 

! !  name-function  C 

(C  G) 

(C  fix  A/.  G). 

! !  unfold  C 

(C  fix  A f.Gf) 

(C  (G  fix  A/.  G  /)). 

! !  reduce-1  C 

(C  Ax.  appl  (lam  An.  G  n)  x) 

(C  Ax.  G  x). 

! !  distributeJf_2  op  C 

(C  Ax.  Ay.  op  (if  ( B  x  y) 

(Ei  x  y) 

(E2xy)) 

(H  xy)) 

(C  Ax.  Ay.  if  ( B  x  y) 

(op  (Ei  xy)(H  x  y)) 

(op  (E2  x  y)  (H  x  y))). 

! !  left-identity _2  op  C 

(C  Ax.  Ay.  op  a(H  x  y)) 

(C  Ax.  Ay.  H  xy) 

<=  left  Jdentity  op  a. 

! !  reassociate-2  op  C 

(C  Ax.  Ay.  op  (op  (Hi  x  y)  ( H2  x  y))  (H3  x  y)) 

(C  Ax.  Ay.  op  (Hi  xy)  (op  (H2  x  y)  (H3  x  y))) 

<=  associative  op. 

! !  fold_two_3  Ci\C2  C3  (C2  fix  A/,  lam  Am.  lam  An.  C3G  n  m) 

(Ci  A/.  Ax.  Ay.  C3  G  (Hi  x  y)  (H2  x  y)) 

(Ci  A/.  Ax.  Ay.  appl  (appl  /  ( H2xy ))  (H2  x  y)). 


Figure  7.1:  Transformation  Rules 


93 


! !  tail-rec  op  F0  Fio  <= 


insert  Jam 

$CFC) 

* 0  *1, 

add-oper  -right  Jd-1 

op 

(AC.  lam  An.  C  n) 

Fi  F2, 

abstract-arg 

op 

(AC.  C) 

(AC.  lam  An.  op  (Wo  n)  C) 

F2  f3, 

name  Junction 

(AC.  appl  C  W) 
ns  ^4, 

unfold 

fAC.  appl  (fix  A/',  lam  Am.  lam  An.  op  (appl  C  n)  m)  IV) 

^4  FB, 

reduce-1 

(XC.  appl  (fix  A/',  lam  Am.  lam  An.  op  (C  n)  m)  IV) 

*5  F, 6 , 

distributeJf_2 

^C.  appl  (fix  A/',  lam  Am.  lam  An.  C  m  n)  IV) 

^6  /y, 

left  Jdentity_2 

op 

(AC.  appl  (fix  A/',  lam  Am.  lam  An.  if  (IVj  m  n)  (C  m  n)  (IV3  m  n))  IV) 

*7  ns, 

reassociate-2 

op 

^C.  appl  (fix  A/',  lam  Am.  lam  An.  if  (IVj  m  n)  (W2  m  n)  (C  m  n))  IV) 

•ns  ns. 

fold-two-3 

(AC.  appl  (fix  A/',  lam  Am.  lam  An.  if  (IV!  m  n)  (IV2  m  n)  (C  /'  m  n))  IV) 
(AC.  appl  C  W) 

(AC.  A  Hi.  A  H2.  op  (appl  C  Hr  )  ff2) 

F4  Fg  Fio- 

Figure  7.2:  Meta-program 
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Chapter  8 

ADProlog  and  EBG 


Within  this  chapter,  we  more  formally  develop  the  ADProlog  language  and  the  higher-order 
EBG  algorithm  it  admits.  To  that  end,  we  first  extend  the  inference  system  of  §3.7  to 
realize  ADProlog.  Next,  we  introduce  prototype  implementations  of  ADProlog,  and  then  of 
EBG  over  ADProlog,  each  through  an  interpreter  written  in  AProlog.  While  these  interpreters 
are  too  slow  to  be  of  great  practical  value,  they  serve  as  an  abstract  specification  of  both 
the  ADProlog  logic  and  the  EBG  algorithm. 

The  concepts  developed  herein  (in  particular,  the  AProlog  interpreters)  are  sufficiently  de¬ 
tailed  and  deep  that  readers  will  likely  have  to  invest  some  time  studying  the  presentation 
(and  scrutinizing  the  code).  The  more  casual  reviewer  may  wish  to  skim  this  chapter  instead. 


Other  work,  del  Cerro  offers  another  approach  to  incorporating  modal  logic  within  the 
logic  programming  framework  that  has  nothing  to  do  with  EBG  [26,  27].  For  treatments  of 
automated  theorem  proving  in  modal  logics  outside  of  logic  programming  (and  EBG),  see 
Wallen  [131]  and  Thistlewaite  [128]. 


8.1  The  Logic  of  ADProlog 

The  syntax  of  AQ Prolog  is  summarized  by  the  following  inductively  defined  classes: 

G  ::=  true  |  A  \  Gx  ,  G2  |  Gx  ;  G2  \  D  =>  G  |  Vx  [:  r].  G  |  3x  [:  r].  G  |  □  Ga 

Ga  "=  true  |  A  |  Gai  ,  Ga 2  |  Vx  [:  t],  Ga  |  O  Ga 

D  ::=  true  j  A  |  Dx  ,  D2  |  D  <=  G  [  Vx  [:  r].  D  j  □  D 

V  ::=  e  |  D.  V  \  \\D.  V 

where  the  new  meta-variable  Ga  over  ranges  over  ‘boxed’  goals,  and  c  is  the  null  terminal. 
Although  our  examples  have  mainly  employed  □  at  the  top-level,  the  above  definition  points 
out  that  □  is  in  no  way  restricted  to  outermost  occurrences. 
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A°Prolog  does  not  distinguish  sequences  of  the  modal  prefix;  that  is,  □  DM  is  equivalent 
to  DM.  For  readers  familiar  with  modal  logic,  in  this  respect  ADProlog  may  be  considered 
an  intuitionistic  version  of  the  classical  modal  logic  S5  [16].  However,  ADProlog  is  properly 
contained  within  S5  as  it  lacks  the  logical  connectives  of  negation  (->)  and  the  second  modal 
operator  of  possibility  O,  which  may  be  defined  as  -«□->.  (The  difference  between  possible 
and  contingent  truth  is  similar  to  that  between  contingency  and  necessity:  <0 A  is  to  A  as  A 
is  to  a  A.  A°Prolog  could  equally  have  been  formulated  with  unprefixed  clauses  representing 
domain  theory  and  clauses  prefixed  with  O  standing  for  training  theory.) 

The  above  definition  disallows  goals  of  the  form  □  (£)  =S>  G),  D(3 x.G),  and  D(Gi  ;  G2 ). 
For  □(£)  =*>■  G),  this  restriction  is  motivated  by  the  lack  of  modally  correct  strategy  ( i.e., 
inference  rules)  for  solving  this  goal.  One  possible  tactic  would  be  the  inference  rule 

{ODjuVbeG 

V  he  □  (£>=»  G) 

Note,  however,  that  the  above  permits  H-  □  (O.A  =>•  A)  to  succeed,  although  it  is  clearly 
an  invalid  goal.  For  the  remaining  disjunctive  and  existential  G-forms  —  d(G  1  ;  G2)  and 
□(3x.G)  —  there  exist  valid  inference  rules: 

V  be  □  Gi 
V\-0u  (Gi  ;  G2) 

V  □  G2 
(Gi  ;  G2) 

Vi-gOGy 

V  □  (3x:r.  Gx)  where  y  0  free(G). 

However,  we  question  the  usefulness  of  the  above  inferences,  and  whether  any  additional 
expressivity  is  provided  by  those  G-forms.  For  now,  we  have  made  the  simplifying  assumption 
of  disallowing  each  of  these  goals. 


8.2  Normal-form  for  Clauses 

The  AProlog  inference  system  of  §3.7  may  be  extended  to  realize  ‘pure’  ADProlog.  (By 
‘pure’  we  simply  mean  the  logical  foundation  of  the  language  —  that  is,  the  logical  connec¬ 
tives  without  EBG,  rule,  *!’,  etc.)  Before  presenting  that  interpretation,  we  first  derive  a 
normal-form  for  arbitrary  Aa Prolog  ZMorms,  analogous  to  that  given  for  AProlog  in  §3.4.2. 
This  normal-form  is  exploited  by  the  inference  system  of  §8.3  as  well  as  our  full  A°Prolog 
implementation  (Chapter  9). 
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The  new  normal-form  Dn f  is  defined  as 


A,  f 

::=  Dnf  ,  Dnf 

|  £>v 

Dv 

::=  Vx.  Dv  | 

A  4=  Gw  |  (□  Day)  «=  Gw) 

Day 

::=  Vt/.Dav 

A  4=  Gs 

Under  this  definition,  the  program  V  is  mapped  to  a  set  of  clauses,  each  of  which  is  either 
of  the  previous  normal-form  — 

VW.  A  <=  Gw. 


or  of  the  extended  normal-form  — 


VW\  (□  V«S.  A  <=  Gs)  4=  <7W. 


where  variables  of  W  may  appear  free  in  Gw,  and  those  of  both  W  and  S  may  appear  free 
in  A  and  G s.  (We  take  the  liberty  of  dropping  the  W  and  S  on  A  as  the  alternative  Aws 
becomes  overly  intrusive.)  As  before,  an  atomic  clause  A  becomes  A  4=  true,  but  now  a 
‘boxed’  atomic  clause  □  A  becomes  □  (A  4=  true)  <=  true. 

The  validity  of  Dn f  relies  upon  our  decision  to  collapse  sequences  of  the  Modal  operator  — 
UUD  goes  to  UD.  To  illustrate,  the  D-forms  on  the  left  are  mapped  to  the  normal- D-forms 
on  the  right: 


q 

□  q 

Vx.  □  q  x 

□  Vx.  q  x 

□  (p  =>  □  (r  =»  q)) 

a  ((aP,r)  =►  (°q, «)) 

□  (□  p  =»  □  (□  r  =►  □  q)) 
p  =>  Vx.  □  (s,  (r  x  =>•  q)) 


q  4=  true 

□  (q  4=  true)  4=  true 

Vx.  (□  (qx  4=  true)  <=  true) 
(□  Vx.  qi<=  true)  <=  true 

□  (q  <=  (p,  r))  <=  true 

□  (q  •«=  (□  p,  r))  4=  true, 

□  (s  4  (□  p,  r))  4=  true 

□  (q  4=  (□  p,  □  r))  4=  true 

Vx.  (□  (q  4=  r  x)  4=  p), 

Vx.  (□  (s  4=  true)  4=  p) 


In  §3.6  we  developed  a  AProlog  program  mapping  arbitrary  AProlog  .D-forms  to  a  normal- 
form.  In  Figures  8.1  &  8.2,  we  give  an  analogous  mapping  (again  within  AProlog)  for 
ADProlog  D-forms.  As  before,  the  predicate  requantify  moves  quantifiers  down;  the  predi¬ 
cate  conjoin  collects  preconditions;  and  the  predicate  ndform  coordinates  the  other  pred¬ 
icates.  Now,  however,  each  of  these  predicates  has  a  weak  (prefixed  by  ‘w’)  and  a  strong 
(prefixed  by  ‘s’)  version.  The  ‘s’  is  indicative  of  those  components  that  are  nested  under  a 
□  within  the  original  D-form,  while  the  ‘w’  is  for  components  not  so  nested.  In  fact,  the 
new  normal-form  is  most  easily  viewed  as  two  levels  of  the  preceding  normal-form:  one  for 
the  ‘boxed’  portion;  the  other,  for  the  ‘unboxed’  (although  both  Gw  and  G*  may  themselves 
contain  □). 
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wrequantify 

wrequantify 

srequantify 

srequantify 

srequantify 

wconjoin  G 

wconjoin  G 
wconjoin  G 
wconjoin  G 

sconjoin  G 

sconjoin  G 
sconjoin  G 
sconjoin  G 
sconjoin  G 


(Vx.  Dix,D2x)  (D{  ,  D'2)  <=  !,  wrequantify  (V Dx)  D{, 

wrequantify  (V  D2)  D^ 


D  D. 


(Vx.  DlX  ,  D2x)  (D[  ,  D'2)  <=  !, 

srequantify  (V  DY)  D[, 

(Vx.  O  Dx) 

(□£>')  <*=  !, 

srequantify  (V  D2)  D2 
srequantify  (Vx.  Dx)  D' 

D 

D. 

(A  ,  D2) 

(D[  ,  D'2)  *= 

wconjoin  G  Di  D[, 

(VD) 

(vzy)  <= 

wconjoin  G  D2  DJ,. 

Vx.  wconjoin  G  (Dx)  (D'x). 

(A  •$=  true) 

{A  <=  G). 

(A  <=  Gi) 

(A  <=  (G  ,  Gj)). 

(A  ,  d2) 

(D[  ,  DJ)  ^ 

sconjoin  G  D\  , 

(VZ3) 

(V£K)  <= 

sconjoin  G  D2  D^. 

Vx.  sconjoin  G  (Dx)  (D'x). 

(□  D) 

{pry) 

sconjoin  G  D  D'. 

{A  •£:  true) 

<=  G). 

(A  <=  Gr) 

(A*=(G,Gi)). 

Figure  8.1: 

Clause  normal-form 

conversion  (Part  1). 

98 


wndform 

(Oi  ,  d2) 

,Vt) 

•$= 

,  wndform  D\  Dj , 
wndform  Dz  D2. 

wndform 

(VZ>) 

D" 

<= 

,  (V*.  wndform  ( Dx )  (i^x)), 
wrequantify  (V  Z)')  D". 

wndform 

(D<=G) 

D" 

<= 

,  ndform  D  /y, 
wconjoin  G  D'  D" . 

wndform 

wndform 

(□  D) 

A 

D" 

{A  <=  true). 

,  sndform  D  D', 
conjoin-true  D'  D" . 

sndform 

(Dt  ,  D2) 

,ry2) 

<= 

,  sndform  D\  D[, 
sndform  D2  D"2. 

sndform 

(V£») 

D" 

<= 

,  (Vx.  sndform  (Dx)  (D'z)), 
srequantify  (V  D*)  D". 

sndform 

(D<=G) 

D" 

•<= 

,  sndform  D  D', 
sconjoin  G  D'  D". 

sndform 

sndform 

(□  D) 

A 

D' 

(□  (A  •<=  true)). 

<= 

,  sndform  D  D' . 

conjoin_true  (D\  ,  D?)  (D{  ,  D'2)  <=  conjoin-true  D\  D[ , 

conjoin-true  D?  D2. 

conjoin-true  (□  D)  (□!)■<=  true). 


ndform  D  D'  <=  wndform  D  D1 . 


Figure  8.2:  Clause  normal-form  conversion  (Part  2). 
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Vt-jG 

v  hy  (□  G) 


(VW.  A  <=  Gw)  €  nf(P)  0<rwA  =  0A' 

^7^ 


0?  PJ  0<rwGw 


where  dom(cryv)  C  W, 

dom(0)  fl  W  =  0, 

and  <7yv  &  0  are  minimal. 


(VW.  (□  VS.  A  <=  Gs)  <i=  Gw)  €  nf(7>)  0<rw<r« A  =  9 A' 

— — — 


9V  l-J  0<rwGw  ,  Oow<tsGs 


where  dom(aw)  Q  W, 
dom(<rs)  C  S, 

dom(0)  n  W  =  0,  dom(0)  n  S  =  0, 
and  <ryy,  0s  &  0  are  minimal. 


Figure  8.3:  Partial  ‘weak’  inference  rules  for  AD Prolog. 


One  use  of  higher-order  matching  exploited  within  Figures  8.1  &  8.2  deserves  further  expla¬ 
nation:  By  matching  D  against  the  A-term  (Vx.  Di  x  ,  D2  x),  we  insure  that  D  is  a  univer¬ 
sally  quantified  conjunction,  and  that  D\  and  D2  are  bound  to  the  appropriate  functions  of  x 
within  D.  For  example,  for  D  =  (Vx.  a  x  ,  b  x),  the  preceding  instantiates  Dx  =  lam  Ax.  a  x 
and  D2  =  lam  Ax.  b  x.  Similarly,  for  D  =  Vx.  a  x,  unifying  D  with  V  Dx  instantiates 
Dx  =  lam  Ax.  a  x. 


We  do  not  herein  attempt  to  formally  establish  that  all  AD Prolog  D-forms  can  be  mapped  to 
Dn f ,  although  the  code  provides  some  evidence  for  this.  Nor,  for  that  matter,  do  we  argue 
further  that  the  mapping  to  Dn f  is  meaning-preserving.  A  proof  could  take  the  form  of  an 
extended  set  of  distributive  transformations  analogous  to  those  presented  in  §3.6. 


8.3  Inference  System  for  ADProlog. 

It  is  important  to  distinguish  the  programming  language  ADProlog  from  the  process  that 
produces  explanation- based  generalizations  of  An  Prolog  computation.  Within  this  section 
we  further  develop  the  former  by  extending  the  inference  system  of  §3.7  to  implement  pure 
A°Prolog.  To  that  end,  we  split  the  F  relation  into  Fw  and  Fs  —  the  former  for  the  derivation 
of  unboxed  (‘weak’)  goals;  the  latter  for  that  of  boxed  (‘strong’):  for  example,  from  the  clause 
p  we  cannot  derive  the  goal  □  p,  but  the  goal  p  does  follow  from  the  clause  □  p.  Initial 
queries  then  are  phrased  as  Fw  G,  but  the  inference  #-w  □  G  is  defined  in  terms  of  H  G. 
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V  l-‘  true 


V  h$  Gx  ev  9G2 

Vt%(Gx,G3) 

V  *-'e  Gy 

V  t-g  (Vz:r.  Gx)  where  y  £  free(0G), 

y  £  fr tt(9V),  and  y  dom(0) 

T>*-\G 
V  I-}  (□  G) 

(VW.  (□  VS.  A  <=  Gs)  ■<=  Gw)  €  nf(P)  9<rwasA  =  9 A'  9V  9awG w,  □  9awasGi 

_____ 

where  dom(cryv)  C  W, 
dom(tr5)  C  S, 

dom(^)  n  W  =  0,  dom(0)  n  S  =  0. 
and  <ryv,  <75  &  9  are  minimal. 


Figure  8.4:  ‘Strong’  inference  rules  for  ADProlog. 

As  one  might  expect,  l-w  largely  follows  the  definition  of  h:  In  Figure  8.3  we  list  only  those 
rules  that  have  changed.  For  completeness,  we  include  the  full  definition  of  I-*  in  Figure  8.4, 
although  it  too  largely  follows  h;  in  fact,  only  the  final  three  rules  differ.  Of  particular 
importance  is  that  only  D- forms  containing  □  are  used  to  establish  h*  G:  we  ensure  the 
necessary  tr:th  of  G  by  requiring  that  each  to  its  deriving  clauses  is  also  necessarily  true. 

Informal  A°Prolog  interpreter.  For  those  readers  preferring  a  ‘logic  programming’  ori¬ 
ented  description  of  the  interpretation  of  AD Prolog,  we  offer  the  following  insight  into  the 
preceding  inference  rules.  Subsequent  sections  will  further  explicate  AD Prolog. 

As  within  our  informal  AProlog  interpreter  of  §3.4.2,  A° Prolog  goals  are  herein  reduced  to  a 
atomic  subgoal  G»  for  solution.  And  as  within  the  inference  rules,  necessary  (boxed)  goals 
are  strongly  solved;  contingent  ones,  weakly  solved.  For  strong  solution,  applied  clauses 
must  be  of  the  extended  normal  form  (i.e.,  contain  □),  since  only  a  necessarily  true  clause 
can  establish  a  necessarily  true  goal.  On  the  other  hand,  clauses  of  either  normal-form  are 
relevant  for  weak  solution. 

From  Ga  and  the  normal  ZMorm 
VW.  (□  VS.  A<=G*)<=  Gw. 
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(unboxed  D-forms  are  handled  as  before),  we  precede  as  follows: 

•  Create  new  logical  variables  VV  and  S  for  each  universal  variable  in  W  and  S ,  and 
then  substitute  VV  for  VV  and  S  for  S  in  A,  G w,  and  G®,  yielding  A,  Gw,  and  G*. 
respectively. 

•  Unify  A  and  G*. 

•  For  strong  solution,  solve  ( Gw  ,  □  G®).  For  weak  solution,  solve  (Gw  ,  G*). 

8.4  Introduction  to  the  Meta-Interpreters 

While  the  preceding  inference  system  provides  a  formal  characterization  of  the  AD Prolog 
logic,  we  have  yet  to  treat  higher-order  EBG.  To  that  end,  we  develop  a  program  that 
implements  EBG  over  ADProlog  computation.  This  implementation  consists  of  an  extended 
A°Prolog  interpreter  written  in  AProlog.  To  simplify  discussion,  we  first  present,  in  §8.5,  the 
basic  A° Prolog  interpreter  without  the  generalizing  component.  This  A°Prolog  interpreter 
provides  a  formal  operational  specification  of  ADProlog  which,  for  the  most  part,  mirrors  that 
given  by  the  inference  system  of  §8.3.  Due  to  the  closeness  of  the  correspondence  between 
the  object-language  (ADProlog)  and  the  meta-language  (AProlog),  we  shall  often  use  the 
more  descriptive  term  ‘meta-interpreter.’ 

Our  A° Prolog  meta-interpreter  is  extended  to  perform  EBG  within  a  second  prototype  in 
§8.6.  This  expanded  meta-interpreter  exemplifies  the  generalization  algorithm,  and  has  re¬ 
produced  most  of  the  examples  contained  within  this  dissertation.  (Others  were  derived 
under  the  full  implementation  described  in  Chapter  9.)  So  that  our  presentation  is  more 
accessible,  we  have  deferred  some  less  pertinent  details  of  the  generalizing  meta-interpreter 
to  Appendix  A. 4. 

Finally,  in  §8.7  we  further  extend  this  meta-interpreter  to  admit  operationally  criteria. 

We  chose  to  prototype  A°Prolog  and  higher-order  EBG  in  this  manner  to  facilitate  ex¬ 
perimentation  with  alternative  formulations  of  both  the  language  and  the  generalization 
algorithm,  and  moreover,  to  provide  a  formal  specification  of  each.  The  prototypes  are 
sufficiently  slow  and  limited,  however,  that  a  more  direct  implementation  was  eventually 
required  (Chapter  9). 

8.4.1  Accessing  the  Logic  Program 

To  run  examples  under  the  meta-interpreters  to  follow,  the  ADProlog  program  V0i  to  be 
interpreted  must  be  available  as  data.  This  is  accomplished  by  assuming  hyp  D  for  each 
clause  D  of  V0h  prior  to  invoking  the  meta- interpreter,  hyp  addresses  the  need  for  reification 
—  the  mapping  from  program  to  data.  (Reification  is  the  inverse  of  reflection;  see  §5.3.1.) 
Within  Prolog  reification  is  accomplished  with  clause  D  G ,  which  matches  against  clauses 
in  the  program-base,  instantiating  D  to  the  head  and  G  to  the  corresponding  body.  AProlog 
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Figure  8.5:  Meta-interpreter  without  EBG:  Goal  analysis. 


does  not  provide  a  clause  construct.1  Hence  in  AProlog,  to  manipulate  programs  and  then 
run  the  derived  results  directly  requires  that  two  versions  of  the  program  be  present:  the 
initial  one  V0b,  which  is  available  as  data  (via  an  indexing  predicate  such  as  hyp),  and  the 
reflected  one  added  to  the  logic  program  V. 

hyp  allows  the  meta-interpreter  to  enumerate  the  program  V0b  to  be  interpreted  with 
AProlog’s  backtracking  search  (by  successively  solving  the  goal  hyp  D),  although  obviously 
the  performance  of  such  an  approach  suffers  in  comparison  with  the  schemes  employed  by 
more  standard  logic  programming  implementations  ( e.g .,  hashing  on  the  name  of  the  predi¬ 
cate  heading  an  atom). 

As  mentioned  above,  the  variables  of  clauses  asserted  with  hyp  must  be  explicitly  universally 
quantified:  the  scope  of  AProlog’s  implicit  quantification  is  insufficient  as  it  is  includes  hyp 
as  well.  (The  Ml’  convention,  while  part  of  the  eventual  system,  only  functions  at  the  top- 
level,  and  moreover,  is  not  realizable  within  the  prototype  interpreter.)  What  follows  is  a 
portion  of  the  ubiquitous  suicide  example  in  the  form  recognized  by  the  meta-interpreter: 

hyp  (□  Va.  Vb.  Vc.  kill  a  b  4=  hate  a  b,  possess  a  c,  weapon  c). 

hyp  (gun  objl). 


8.5  The  Met  a- Interpreter 

Our  AaProlog  interpreter  is  divided  between  two  sets  of  clauses:  the  solve  predicates  of 
Figure  8.5,  which  reduce  a  given  ADProlog  goal  G  to  some  number  of  atomic  subgoals  ( Ga’s), 


*A  AProlog  clause  would  simply  take  the  form  clause  D,  but  would  also  be  more  complex  in  that  it 
presumably  would  explicitly  quantify  universal  variables.  There  does  not  appear,  however,  to  be  any  logical 
problem  with  adding  clause  to  AProlog. 
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and  the  match  predicates  of  Figure  8.6,  which  attempt  to  derive  a  pending  atomic  subgoal 
Ga  from  the  program  V0b  • 

The  goal  reduction  performed  by  solve  is  again  split  between  two  sets  of  clauses:  wsolve  for 
‘weak- solve’  and  ssolve  for  ‘strong-solve.’  This  distinction,  analogous  to  that  made  between 
H-w  and  #-*,  arises  from  the  more  stringent  proof  required  by  the  necessary  truth  of  ‘boxed’ 
goals:  {Dp}(-p,  but  not  {p}  b  □  :  p.  The  top-level  predicate  is  wsolve,  because  goals  are 
contingent  until  a  □  has  been  encountered.  Each  of  the  Ga's  derived  through  solve  will 
require  either  a  ‘strong’  or  ‘weak’  proof,  which  is  realized  through  the  corresponding  match 
predicates  —  wmatch  and  smatch. 

Within  the  solve  predicates,  the  solution  of  a  A  “Prolog  goal  is  largely  realized  by  the  cor¬ 
responding  AProlog  construct.  For  example,  a  ADProlog  conjunction  {G\  ,  (?2)  is  derived 
by  establishing  the  AProlog  conjunction  of  the  solutions  to  G\  and  (?2.  Similarly,  a  univer¬ 
sally  quantified  ADProlog  goal  is  universally  derived  under  AProlog.  And  an  implicational 
goal  D  =>  G  is  proven  by  first  assuming  D,  and  then  attempting  to  derive  G.  Such  sharing 
between  object-language  (AD  Prolog)  and  meta-language  (AProlog)  makes  for  elegant  inter¬ 
pretation.  (The  rules  of  ssolve  do  not  address  the  range  of  AProlog  connectives  because  of 
the  additional  restrictions  placed  upon  boxed  goals;  see  §8.1.) 

In  the  final  clauses  of  wsolve  and  ssolve,  the  pending  goal  has  been  reduced  to  an  atomic  Ga. 
This  is  insured  by  our  use  of  the  cut  operator  *!’  described  in  §3.5.  Cut’s  only  effect  within 
solve  is  to  insure  that  Ga  is  indeed  atomic:  if  Ga  instead  contained  a  logical  connective,  T 
would  not  have  permitted  the  interpretation  to  ‘fall  through’  to  its  present  position,  as  one 
of  the  preceding  clauses  would  have  been  chosen. 

Through  the  predicate  hyp,  the  final  clauses  of  wsolve  and  ssolve  select  a  potentially 
pertinent  .clause  D  from  the  program,  which  the  match  predicates  then  attempt  to  apply 
in  the  proof  of  Ga.  The  selection  of  D  is  inefficient  in  that  each  clause  of  V0h  is  simply  tried 
in  order  until  one  is  found  that  derives  Ga.  (For  the  purposes  of  this  meta-interpretation, 
D-forms  need  not  be  in  normal-form.)  As  we  shall  see,  in  the  course  of  deriving  Ga  from 
D,  match  may  produce  subgoals  ( Gs’s)  that  must  be  subsequently  solved  to  complete  the 
proof. 

The  match  predicates  analyze  the  selected  program  clause  D  to  determine  if  it  is  applicable 
in  the  solution  of  Ga.  For  a  conjunction  (D\  ,  Z?2),  the  (intuitionistic)  logic  programming 
paradigm  dictates  that  either  D\  or  D2  individually  derives  Ga  (although  both  Di  and  D2 
are  available  for  the  derivation  of  any  resulting  subgoals.)  A  universally  quantified  clause 
V  D  (or  equivalently,  Vx.Dx)  is  reduced  by  replacing  the  bound  variable  with  a  new  logical 
variable  Y,  which  may  become  instantiated  in  the  course  of  the  proof:  for  example,  the 
clause  Vz.  weapon  z  •$=  gun  z  becomes  weapon  Y  <=  gun  Y.  If  D  is  a  rule  D'  <=  G ',  we 
conjoin  G'  with  the  subgoals  that  arise  from  establishing  that  D'  implies  Ga:  for  the  clause 
weapon  Y  <=■  gun  Y ,  the  interpreter  first  determines  whether  weapon  Y  establishes  Ga, 
and  then  attempts  tv  solve  gun  Z.  When  smatch  encounters  a  □  in  the  program,  the 
nested  clause  need  only  be  weakly  matched  with  the  current  goal.  This  is  because  proving  a 
goal  ‘strongly’  simply  requires  that  any  utilized  clauses  must  themselves  be  necessarily  true. 
The  resulting  subgoal  G%  is,  however,  boxed  as  it  too  must  be  strongly  proved.  On  the  other 
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Ga 
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Figure  8.6: 

Meta-interpreter  without  EBG:  Clause  analysis. 

hand,  wmatch  ignores  D’s  within  D,  because  therein  we  axe  only  concerned  with  a  weak 
proof. 

In  the  final  clause  of  wmatch,  the  unification  of  an  atomic  Da  and  Ga  is  attempted:  for 
example,  unifying  the  goal  weapon  objl  with  the  clause  weapon  Y.  This  is  analogous 
to  the  unification  of  a  goal  and  clause  head  under  a  Prolog  interpretation.  If  successful, 
this  has  the  effect  of  ‘returning’  the  accumulated  conjunction  of  subgoals  Gs  (in  this  case, 
gun  objl)  to  the  last  clause  of  solve,  which  then  derives  Gs  recursively.  The  predicate 
smatch  is,  however,  missing  the  analogue  to  the  last  clause  of  wmatch.  This  is  because  a 
contingent  atomic  clause  cannot  be  used  to  prove  a  necessary  atomic  goal;  that  is  the  clause 
p  is  not  sufficient  to  derive  □  p. 

This  concludes  the  discussion  of  the  basic  A D  Prolog  meta-interpreter.  The  next  step  is 
extending  it  to  perform  EBG. 


8.6  The  Generalizing  Meta-Interpreter 

Within  this  section  we  extend  the  ADProlog  meta-interpreter  of  §8.5  to  perform  EBG.  This 
section  focuses  on  developing  the  most  relevant  and  interesting  aspects  of  the  prototype;  the 
unabridged  meta-interpreter  may  be  found  in  Appendix  A. 4. 

Kedar-Cabelli  &  McCarty  produce  first-order  explanation- based  generalizations  within  Pro¬ 
log  via  an  augmented  meta-interpreter  [71].  As  we  shall  take  a  similar  approach,  we  briefly 
review  Kedar-Cabelli  &  McCarty’s  implementation:  Under  its  second  formulation  (pp.  387- 
388),  their  meta-interpreter,  prolog.ebg,  solves  a  particular  query  in  parallel  with  the 
construction  of  the  associated  explanation-based  generalization.  The  predicate  prolog_ebg 
takes  three  arguments:  the  particular  query  G,  the  generalized  query  GG,  and  the  conjunc¬ 
tion  of  generalized  conditions  DD  sufficient  to  establish  GG. 

Each  ‘rule’  applied  by  prolog.ebg  in  the  proof  of  G  is  similarly  applied  in  the  proof  of  GG. 
Leaves  of  the  Prolog  computation  that  arise  in  the  course  of  deriving  GG  (i.e.,  those  goals 
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Figure  8.7:  Generalizing  meta-interpreter:  Goal  analysis. 


established  by  ‘facts’)  are  accumulated  in  the  conjunction  of  sufficient  conditions  DD.  The 
resulting  explanation-based  generalization  is  then  GG  4=  DD,  where  for  example 

GG  =  kUl  a:  a: 

DD  =  depressed  x,  buy  x  y,  gun  y 

No  explicit  representation  of  the  proof  need  be  constructed;  it  is  inherent  in  the  Prolog 
search. 

As  in  the  first-order  approach  of  Kedar-Cabelli  and  McCarty  [71],  our  generalizing  meta¬ 
interpreter  develops  two  parallel  proofs  simultaneously:  a  proof  of  G  and  a  generalized  proof 
of  GG.  Again  these  proofs  are  not  explicitly  constructed;  rather  they  are  implicit  in  the 
AProlog  search.  In  the  course  of  deriving  G  and  GG,  the  implementation  accumulates  the 
conjunction  of  generalized  clauses  DD  sufficient  to  establish  GG  —  that  is,  the  leaves  of 
the  generalized  proof.  Figure  8.7  contains  the  extended  solve  predicates  of  the  generalizing 
interpreter,  which  also  accept  three  arguments  —  the  goal  G  (instantiated),  the  general¬ 
ized  goal  GG  (uninstantiated),  and  the  conjunction  of  generalized  sufficient  conditions  DD 
(uninstantiated).  The  resulting  explanation-based  generalization  is  then  ! !  GG  <=  DD. 

In  the  extended  wsolve  and  ssolve,  the  decomposition  of  G  guides  the  corresponding  in¬ 
stantiation  of  the  generalized  goal  GG.  It  is  only  at  the  atomic  level  where  G  and  GG 
diverge.  (An  exception  is  made  for  the  handling  of  implicational  goals  D'  =*-•  G ',  which  is 
simplified  by  locally  treating  D'  as  a  part  of  V0t-)  The  MG's  (for  ‘meta-subgoal’)  in  the 
final  clauses  of  solve  assume  a  role  analogous  to  that  played  by  subgoals  in  the  previous 
meta- interpreter  —  that  is,  MG' s  retain  subproof  tasks  for  later  derivation.  The  transition 
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Figure  8.8:  Generalizing  meta-interpreter:  Clause  analysis. 


from  the  Gs'  s  of  the  first  interpreter  to  the  current  MG' s  comes  out  of  the  need  to  maintain 
both  G  and  GG  for  subsequent  solution.  The  straight-forward  clauses  meta_wsolve  and 
metajssolve  that  interpret  MG's  are  given  within  Figure  8.9. 

After  solve  selects  a  clause  D  with  which  to  derive  Ga,  the  extended  match  predicates  of 
Figure  8.8,  attempt  to  apply  D  in  the  solution  of  G„.  But  in  the  course  of  deriving  (7a,  the 
new  match  also  yields  a  generalized  atomic  goal  GGa  and  a  generalized  clause  DD  sufficient 
to  establish  GGa-  Within  the  final  clause  of  wmatch  where  Da  is  unified  with  Ga,  DD  is 
instead  unified  with  GGa.  That  neither  the  pair  Ga  and  GGa  nor  the  pair  Da  and  DD  are 
unified  is  essential  for  generalization:  DD  and  GG  need  only  be  instantiated  to  the  point 
that  GG  necessarily  follows  from  DD. 

How  then  do  any  of  the  constants  of  D  (first  or  higher-order)  ever  end  up  in  GG  or  DD1 
The  answer  is  that  unless  some  of  the  D's  employed  in  the  proof  are  boxed,  none  ever  will. 
In  the  matching  of  boxed  clauses,  D  and  DD  are  explicitly  unified  in  the  invocation  of 
bmatch  (for  ‘boxed-match’):  within  the  suicide  problem,  for  example,  both  D  and  DD  are 
bound  to  Wz.  weapon  z  <=  gun  z.  (The  additional  predicate  bmatch  is  required  to  handle 
subtle  differences  in  the  matching  of  instantiated  DD' s.)  While  D  and  DD  are  initially 
equivalent  within  bmatch,  they  may  later  diverge  as  distinct  new  logical  variables  x  and  y 
are  substituted  for  universally  quantified  programs.  This  is  because  D  is  to  be  unified  with 
Ga,  while  DD  is  to  be  unified  with  GGa:  again  for  the  weapon  clause,  D's  logical  variable 
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Figure  8.9:  Generalizing  meta-interpreter:  Meta-Goal  solution. 


becomes  bound  to  objl,  while  that  of  DD  remains  uninstantiated. 

As  both  boxed  and  unboxed  clauses  are  used  in  the  proofs  we  have  developed,  the  reader 
might  rightfully  expect  both  to  appear  in  DD,  the  resulting  sufficient  conditions  of  the 
generalization.  In  fact,  this  is  the  case:  for  the  goal  □  a  b  =>•  (a  ,  b),  our  generalizing 
meta-interpreter  produces  the  explanation-based  generalization  (a  ,  G)  •$=  Da,  G.  How¬ 
ever,  boxed  clauses  are  ‘necessarily’  true,  and  hence  need  not  be  re-checked  during  the  appli¬ 
cation  of  a  derived  rule.  Instead,  it  is  the  conjunction  of  the  utilized  unboxed  clauses  which 
constitutes  the  simplest  expression  of  the  sufficient  conditions  for  GG.  Removing  boxed 
clauses  from  DD  requires  a  simple  reduction  predicate,  whose  definition  may  be  found  in 
Appendix  A.4.  The  result  of  simplifying  the  above  (a  ,  G)  <=  G.  We  take  this  approach  as 
it  is  easier  to  remove  boxed  clauses  from  the  completed  generalization  than  to  avoid  their 
initial  incorporation,  since  only  top-level  boxed  clauses  could  reasonably  be  recognized  in 
solve.  (For  the  meta-interpreter  implementation,  these  simplification  predicates  will  un¬ 
avoidably  destroy  degenerate  generalizations  such  as  the  above  —  *'.e.,  ones  with  variables 
at  the  top-level.) 


8.7  Operationally 

Incorporating  operationally  criteria  within  the  preceding  prototype  requires  providing  the 
meta-interpreter  with  access  to  an  operationality  predicate  oper.  The  revision  involves 
inserting  the  following  clause  at  the  head  of  the  solve  predicates.  We  illustrate  the  change 
for  wsolve;  an  analogous  change  is  necessary  in  ssolve: 

wsolve  G  GG  DD  ■<=  oper  G, !, 

DD  =  GG,  wsolve_orig  G. 

where  wsolve.orig  is  the  version  of  wsolve  that  does  not  perform  EBG  —  t.e.,  that  given 
within  Figure  8.5.  The  computation  proceeds  in  the  same  manner,  but  EBG  is  suspended 
during  the  solution  of  operational  subgoals.  Instead,  DD  is  bound  to  the  current  generalized 
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goal  GG ,  which,  because  it  is  operational,  becomes  one  of  the  sufficient  conditions  of  the 
resulting  generalization.  The  above  clause  is  expected  to  be  used  for  recursive  invocations 
of  solve  —  i.e.,  during  the  solution  of  subgoals;  if  a  top-level  goal  is  made  operational,  the 
resulting  explanation-based  generalization  is  trivial. 

It  is  the  user’s  responsibility  to  specify  the  computation  necessary  to  determine  oper  of 
particular  goals.  Should  no  clauses  be  provided  for  oper,  the  above  implementation  behaves 
in  the  same  manner  as  the  original.  And  since  the  definition  of  oper  may  be  extended  in  the 
course  of  the  current  computation,  this  formulation  permits  dynamic  operationality  criteria 
(although  it  does  not  support  the  expression  of  preconditions  for  dynamic  operationality 
criteria;  see  §4.5). 


8.8  Assimilation 

We  demonstrated  within  Chapter  5  that  rule  cannot  be  implemented  within  AProlog.  On 
the  other  hand,  the  generalizing  interpreter  of  §8.6  illustrates  that  AProlog  is  sufficient  for 
the  realization  of  EBG  itself.  The  question  is,  then,  whether  this  meta-interpreter  can  be 
extended  so  that  explanation-based  generalizations  may  be  learned  (i.e.,  assimilated)?  The 
answer  is  no,  and  the  reason,  as  the  reader  might  expect,  is  AProlog’s  inability  to  universally 
generalize:  consider  that  the  tentative  implementation  of  lemma_ebg 

lemma_ebg  G  K  <=  wsolve  G  GG  DD, 

(( GG  <=  DD)  =»  K). 

typically  allows  the  assimilated  generalization  to  be  applied  only  once  (z.e.,  for  one  instan¬ 
tiation  of  its  variables). 

It  is  pleasing  that  lemma_ebg  can  be  implemented  in  terms  of  rule  and  solve: 

lemma-ebg  G  K  <=  rule  (wsolve  G  GG  DD) 

(VG  VGG  VD£>.  wsolve  G  GG  DD  =►  (GG  <=  DD)) 
K. 


Similarly,  consider  the  following  encoding  of  rule.ebg: 

rule_ebg  G  (OF)  K  •<=  rule  (wsolve  G  GG  DD,  instan  F  GG  E) 

(VG  VGG  VDD  VF  VF. 

(wsolve  G  GG  DD,  instan  F  GG  E) 

=$►(£•$=  DD)) 

K. 

where 

instan  (V  F)  G  D  <=  instan  (Fx)GD. 
instan  (G  =>  D)  G  D. 

The  instan  predicate  simply  replaces  universally  quantified  variables  with  new  logical  vari¬ 
ables,  and  then  does  the  appropriate  unification  between  the  generalized  goal  and  the  goal 
associated  with  the  forward  inference  step. 
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8.9  The  Barcan  Formula 


The  Barcan  formula,  for  a  higher-order  logic,  is  as  follows: 

(Vs.  □  (Pa:))  =>•  (□  Vx.  Px) 

While  the  converse  of  Barcan  —  that  is, 

(□  Vx.  Px)  =*►  (Vx.D(Px)) 

is  true  in  all  modal  logics,  the  validity  of  the  Barcan  formula  varies.  Under  A°Pro]  >g’s 
inference  system,  the  Barcan  formula  is  indeed  valid.2’3 

Although  in  terms  of  the  logic  there  is  no  difference  between  the  left-  and  right-hand  sides 
of  Barcan,  the  generalization  algorithm  does  distinguish  the  two.  That  is,  the  relative  order 
of  □  and  V,  while  not  affecting  provability,  can  affect  EBG!  In  particular,  variables  whose 
universal  quantifiers  are  outside  of  □  are  not  abstracted  within  the  generalized  proof.  To 
illustrate,  we  once  again  employ  the  suicide  example  of  §4.2:  If  the  clause 

! !  weapon  z  •$=  gun  z. 

which  is  simply  shorthand  for 

□  Vz.  weapon  z  <=  gun  z. 

were  replaced  instead  with 

Vz.  □  (weapon  z  <=  gun  z). 

the  resulting  generalization  becomes 

! !  kill  x  x  •<=  depressed  x ,  buy  x  objl,  gun  objl. 

The  reader’s  initial  impression  may  be  that  the  above  violates  the  partition  established 
between  domain  and  training  theory,  since  objl  only  appears  in  the  latter,  but  yet  makes 
its  way  into  the  derived  rule.  Observe,  however,  that  the  universal  quantification  Vz  occurs 

2It  is  also  the  case  that  S5  includes  Barcan. 

3Disallowing  the  Barcan  formula  would  significantly  complicate  the  inference  system.  Consider  that  the 
ordering  of  V  and  □  is  presently  irrelevant.  An  alternative  inference  system  not  admitting  Barcan  would 
have  to  maintain  the  additional  context  of  whether  or  not  universal  variables  occurred  within  the  scope  of 
a  □. 

Similarly,  the  manner  in  which  universal  variables  are  treated  in  A  Prolog  would  complicate  disallowing 
Barcan.  The  problem  is  that  quantifiers  are  not  maintained  during  computation,  but  rather  are  replaced 
by  special  place-holding  uvars,  for  ‘universal  variables.’  This  means  that  the  problem  of  maintaining  scope 
inside  or  outside  of  □  would  require  two  distinct  kinds  of  uvars:  one  for  x,  and  the  other  for  y  within 
Vz.  D  Vy.  D. 
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outside  of  the  □,  and  thus  the  clause  incorporated  within  the  generalized  proof  does  not 
include  that  quantification.  Instead,  the  utilized  clause  may  be  viewed  as 


□  (weapon  z  <=  gun  z). 

Because  the  universal  quantification  occurs  within  the  training  theory,  we  chose  not  to 
universally  generalize  z  within  the  domain  theory.  This  decision  really  just  provides  an 
additional  level  of  expressiveness:  for  the  large  majority  of  situations,  users  will  presumably 
want  the  original  interpretation,  which  is  easily  achieved  with  the  ! !  notation. 

As  an  aside,  if  we  were  to  instead  require  that  EBG  not  discriminate  between  the  left-  and 
right-hand  sides  of  Barcan,  we  would  thereby  avoid  the  need  for  the  ! !  notation:  consider 
that 


! !  weapon  z  <=  gun  z. 
is  equivalent  to 

□  Vz.  weapon  z  <=  gun  z. 

which  is  equivalent  (under  the  Barcan  formula)  to 
Vz.  □  (weapon  z  <=■  gun  z). 
which,  in  turn,  can  be  expressed  as  simply 

□  (weapon  z  •$=  gun  z). 

by  relying  upon  AProlog’s  implicit  universal  quantification  of  top-level  clauses.  It  is  unclear, 
however,  whether  this  alternative  EBG  algorithm  could  be  realized  through  revisions  to  the 
generalizing  meta-interpreter  of  §8.6. 
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Chapter  9 

A^Prolog  Implementation 


In  §3.4.3  we  made  mention  of  eLP,  the  implementation  of  AProlog  written  in  Common  Lisp 
and  developed  at  Carnegie  Mellon  University  by  Conal  Elliott  and  Frank  Pfenning  in  the 
framework  of  the  Ergo  project  [38].  Then  within  §8.5  and  8.6,  we  introduced  interpreters 
written  in  AProlog  for  the  modal  logic  AaProlog  and  for  that  logic  extended  with  explanation- 
based  generalization. 

These  prototype  implementations  of  AQ  Prolog  have  been  extremely  valuable  for  experiment¬ 
ing  with  different  variations  of  both  the  logic  and  the  EBG  algorithm,  and  moreover  for 
providing  a  formal  specification  of  each.  They  are,  however,  extremely  slow  due  to  the 
additional  level  of  interpretation,  which  also  precludes  the  application  of  lower-level  imple¬ 
mentation  strategies  (such  as  hashing  rules  based  upon  predicate  names).  Such  optimizations 
are  not  directly  expressible  within  AProlog  (that  is,  not  without  substantially  complicating 
the  encoding).  Furthermore,  these  meta-interpreters  are  not  sufficiently  powerful  to  handle 
AProlog  primitives  (e.g.,  cut  and  arithmetic),  or  to  realize  the  ! !  convention,  or  to  imple¬ 
ment  our  primitives  for  initiating  and  controlling  generalization  and  assimilation,  rule  and 
rule_ebg  (Chapter  5). 

We  have  addressed  these  deficiencies  by  extending  our  existing  ^Prolog  interpreter,  eLP, 
with  □,  ! !,  rule,  and  rule.ebg.  The  nature  of  these  extensions  is  the  topic  of  this  chapter. 


9.1  Implementing  □ 

The  first  addition  we  made  to  eLP  was  the  modal  logic  operator  □.  The  necessary  extensions 
to  the  eLP  interpreter  largely  follow  the  abstract  A°Prolog  interpreter  developed  in  §8.5: 
goals  are  subject  to  two  levels  of  solution  —  strong  (for  boxed  goals)  and  weak  (for  unboxed), 
and  similarly,  boxed  clauses  and  subclauses  applied  in  the  course  of  a  proof  are  distinguished 
from  their  unboxed  counterparts.  Rather  than  supplying  two  pairs  of  COMMON  LISP  routines 
analogous  to  wsolve  &  ssolve  (which  reduce  complex  goals  to  atomic  ones),  and  wmatch  & 
smatch  (which  use  the  logic  program  to  derive  atomic  goals),  we  simply  included  a  boolean 
context  argument  within  the  corresponding  interpreter  routines. 
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AProlog  clause  normal-form.  eLP  employs  the  normal-form  representation  of  AProlog 
clauses  described  in  §3.6:  recall  that  Dn f  may  be  defined  as 

■Drif  ::=  A  |  -Af  ?  -Af 

D'i  ::=  D |  Vx.  Dy 
D*  ::=  A  <=  G 

In  this  way,  P  is  mapped  to  a  set  of  clauses  such  that  each  D  may  be  represented  as  an 
atomic  clause-head  A,  a  goal  precondition  G  (which  implies  A),  and  a  list  of  universal 
variables  X  over  which  D  is  universally  quantified.  The  motivation  for  using  A f  is  two-fold: 

(1)  clauses  may  thereby  be  simply  represented  as  three  components  —  A,  G,  and  X,  and 

(2)  the  relevance  of  a  given  clause  to  a  particular  atomic  goal  Ga  may  be  easily  determined 
by  unifying  Ga  and  A. 

Once  general  G-forms  have  been  reduced  to  atomic  ones,  the  eLP  interpreter  operationally 
proceeds  as  follows: 

•  Select  a  set  of  clauses  from  P  which  may  be  applicable  to  the  solution  of  Ga.  This 
step  includes  insuring  that  A  and  Ga  begin  with  the  same  predicate,  and  potentially 
makes  use  of  further  matching  optimizations  such  as  indexing  (whereby  subterms  of 
Ga  are  matched  against  pre-selected  subterms  of  A). 

•  Create  new  logical  (or  existential)  variables  X  for  each  universal  variable  in  X,  and 
then  substitute  X  for  X  in  A  and  G,  yielding  A  and  G. 

•  Unify  /I  and  Ga.  If  unsuccessful,  backtrack  and  chose  another  clause. 

•  Recursively  solve  G. 

ADProlog  clause  normal-form.  For  the  same  reasons  as  within  eLP,  we  desire  to  make 
use  of  a  normal-form  for  ADProlog  clauses  within  DeLP.  Recall  that  within  §8.2,  we  developed 
the  following  ADProlog  normal-form: 


A  f 

::=  D„ f  ,  Af 

1  A 

A 

::=  Vx.  £>v  | 

A  <t=  Gw  |  (□  £)Dv)  <=  Gw 

Dav 

::=  Vy.  \ 

A  <=  Gs 

ADProlog  clauses  may  thereby  be  represented  either  in  the  preceding  AProlog  normal-form, 
or  else  as  an  atomic  clause- head  A ,  a  weak  enabling  goal  Gw,  a  strong  enabling  goal  Gs 
(either  of  which  may  be  true  in  the  degenerate  case),  a  set  of  universal  variables  X  that 
appears  outside  the  scope  of  the  optional  □,  and  a  set  of  universal  variables  y  that  appears 
within  the  scope  of  □. 

Rather  than  actually  maintaining  two  normal-forms,  DeLP  uses  the  extended  normal-form 
with  the  additional  inclusion  of  a  boolean  flag  indicative  of  weather  the  original  clause 
contained  □.  This  distinction  is  necessary  in  order  to  differentiate  (□  A)  4=  G  and  A  •$=  G, 
as  the  former  is  sufficient  for  deriving  □  A,  while  the  latter  is  not. 
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As  within  eLP,  AD Prolog  goals  axe  reduced  to  atomic  subgoals  for  solution,  as  the  meta- 
interpreter  of  §8.6  illustrates.  Once  goals  have  been  so  reduced,  the  operational  behavior  of 
the  DeLP  interpreter  may  be  characterized  as  follows  (as  summarized  in  §8.3): 

•  Select  a  series  of  clauses  from  V  which  may  be  applicable  to  the  solution  of  Ga.  If  Ga 
is  being  strongly  solved,  we  may  ignore  clauses  of  the  unboxed  normal-form;  i.e.,  those 
that  do  not  contain  □. 

•  Create  new  logical  variables  VV  and  S  for  each  universal  variable  in  VV  and  S,  and 
then  substitute  VV  for  VV  and  S  for  S  in  A ,  G*1,  and  (7s,  yielding  A ,  Gw,  and  Gs, 
respectively. 

•  Unify  A  and  Ga.  If  unsuccessful,  backtrack  and  chose  another  clause. 

A  A  A  A 

•  For  strong  solution,  solve  (Gw  ,  □  (7s).  For  weak  solution,  solve  ((7W  ,  (7s). 

The  *! !’  notation.  In  order  to  ease  programming  within  our  A£  Prolog  prototype,  we 
included  the  ! !  convention  (introduced  in  §4.3)  for  top-level  ADProlog  clauses.  The  realization 
of  ! !  is  particularly  straightforward,  as  it  simply  requires  merging  VV  into  S  within  the  above 
representation. 


9.2  Implementing  “rule” 

In  Chapter  5  we  established  that  because  there  is  no  provision  for  universally  quantifying 
existing  free  variables,  rule  is  not  implementable  within  a  AProlog  meta-interpreter.  Thus 
for  us  to  actually  experiment  with  the  construct  and  to  run  the  examples  we  have  presented, 
it  was  necessary  to  implement  rule  within  eLP. 

Recall  from  §5.3.2  that  the  alternative  operational  definition  of 
V  h  rule  G  (VA\  Gx  =►  Dx)  K 
is  as  follows: 

A 

1.  Create  new  logical  variables  X  for  each  universal  variable  in  X,  and  then  substitute 
X  for  X  in  Gx  and  Dx,  yielding  G %  and  D#. 

2.  Unify  G  and  G$. 

3.  Solve  G. 

4.  Let  y  =  free(G)  —  free('P) 

5.  Solve  (V^.V^.  Dx)  =>  K. 
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A  A 

VA'  is  correct  as  those  variables  of  X  which  rule  has  instantiated  no  longer  appear  free  in 
D#  (a  side-effect  of  unification). 

The  implementation  of  rule  within  this  framework  is  straightforward,  save  for  two  consid¬ 
erations: 

•  Correctly  limiting  y  so  that  that  the  derived  assumption  is  not  over-general  in  that  it 
universally  generalizes  variables  free  within  pending  assumptions. 

•  Handling  the  heretofore  unaddressed  topic  of  higher-order  constraints. 

9.2.1  Variables  Free  in  Assumptions 

We  emphasized  in  Chapter  5  that  the  universal  generalization  step  associated  with  rule 
must  not  quantify  variables  free  in  assumptions  (».e.,  within  V),  as  such  a  generalization 
violates  the  declarative  nature  of  the  rule  construct.  The  most  straightforward  solution 
would  be  simply  for  DeLP  to  maintain  a  set  T  of  all  the  variables  currently  free  in  V .  This 
would  require  that  before  making  the  local  assumption  D,  that  T  first  be  augmented  with 
any  variables  free  in  D.  (Subsequent  instantiation  of  variables  within  T  does  not  pose  a 
problem,  as  instantiated  variables  no  longer  occur  in  goals  and  clauses.)  rule’s  universal 
generalization  step,  then,  involves  subtracting  this  set  from  the  candidates  for  universal 
generalization. 

A  problem  with  the  above  strategy  is  that  it  is  potentially  computationally  expensive.  A 
second  alternative  is  to  maintain  the  set  of  local  assumptions  (that  potentially  contain  free 
variables).  To  determine  which  variables  to  universally  generalize,  these  assumptions  are 
searched  for  occurrences  of  these  candidates.  While  a  brute-force  approach  to  this  search 
would  be  even  more  expensive  than  the  preceding  algorithm,  the  use  of  time  stamps  on 
variables  and  expressions  should  substantially  reduce  this  overhead:  through  time  stamps, 
older  expressions  need  not  be  searched  for  occurrences  of  newer  variables.1,2 

Within  the  existing  system,  neither  of  the  above  strategies  is  implemented.  Instead,  within 
our  prototype  we  made  the  expedient  choice  of  universally  generalizing  all  free  variables 
within  an  assumption!  This  has  not  proven  to  be  a  problem  for  experimentation,  but  is 
would  certainly  be  unacceptable  for  anything  further. 

9.2.2  Constraints 

In  §5.3.4  we  mentioned  that  higher-order  unification  requires  the  accumulation  of  constraints. 
Simply  put,  such  constraints  are  necessary  to  represent  unifications  that  do  not  result  in 

lFor  timestamping  to  be  effective,  it  is  necessary  that  unification  cannot  result  in  the  instantiation  of  old 
variables  with  newer  ones,  as  this  would  require  re-stamping  the  containing  expressions.  Instead,  unification 
is  realized  by  binding  newer  variables  to  old. 

JDidier  Ramy  of  INRIA  has  shown  that  time  stamping  leads  to  substantially  improved  performance  for 
the  related  problem  of  closing  type  variables  within  the  ML  let  construct. 
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variable  instantiation:  recall  from  §3.2  that  / a  =  g&a  allows  the  variable  /  to  be  instantiated 
with  any  of  Xx.g&a,  Xx.gxa,  Xx.g&x,  or  Xx.gxx,  none  of  which  is  an  instance  of  another. 
Similarly  g  could  be  instantiated  with  Xx.Xy.fa,  Xx.Xy.fx ,  or  Xx.Xy.fy.  By  representing 
/a  as  gaa  as  a  unification  constraint,  subsequent  computation  is  free  to  instantiate  f  or  g 
with  any  of  the  above. 

Since  these  constraints  are  essential  to  the  process  of  higher-order  unification,  they  must 
be  incorporated  within  the  assumptions  derived  through  rule.  Within  eLP,  constraints  are 
passed  along  as  part  of  the  interpretation  environment.  This  is  why,  for  example,  we  did 
not  need  to  treat  them  within  our  meta-interpreters  of  Chapter  8:  that  detail  of  AD  Prolog 
implementation  was  handled  directly  within  A  Prolog.  However,  the  universal  generalization 
of  variables  disqualifies  the  existing  DeLP  constraints.  Thus  we  need  a  means  by  which  to 
capture  the  persistent  constraints  required  by  rule. 

Within  the  .D-forms  assumed  by  rule  in  DeLP,  constraints  are  represented  simply  as  a  con¬ 
junction  of  higher-order  (unification)  equations  over  AProlog  terms.  Hence,  as  mentioned, 
the  actual  form  for  rule’s  assumption  is  'iX'Vy.  a X6DX  <=  3 Z.  Cz,  where  Cz  is  the  con¬ 
junction  of  the  pending  constraints,  and  Z  represents  all  variables  occurring  only  in  Cz  •  We 
return  to  this  topic  in  §9.3.2. 


9.3  Implementing  “rule_ebg” 

Recall  from  §5.4,  the  eLP-oriented  operational  interpretation  of 
rule-ebg  G  (□  VA\  Gx  =*  Dx )  K 
is  as  follows: 

1.  Solve  G  with  EBG  enabled,  resulting  in  the  explanation-based  generalization 

□  Vy.BGGy  <=  ODDy. 

2.  Create  new  logical  variables  X  for  each  universal  variable  in  X,  and  then  substitute 
X  for  X  in  Gx  and  Dx,  yielding  G #  and  D 

3.  Create  new  logical  variables  y  for  each  universal  variable  in  y,  and  then  substitute  y 
for  y  in  GGy  and  DDy ,  yielding  GGy  and  DDy. 

4.  Unify  GG$  and  G 

5.  Solve  (□  V^p.  VX.  D#  <*=  DDy)  K.) 
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(write  M) 

(write  MM) 
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<=  !, 

write  M. 

wsolve 

(read  G) 

(read  GG) 

DD 

<=  !, 

read  At.  3 tt. 

wsolve  ( G  t )  ( GG  tt)  ( DD  tt). 

Figure  9.1:  Generalizing  meta-interpreter:  Special  goals. 


9.3.1  Specials 

We  have  defined  explanation-based  generalization  over  computation  expressed  in  the  logic 
of  A°Prolog,  but  we  have  not  discussed  how  a  generalizing  interpreter  could  handle  extra- 
logical  features,  such  as  !,  input/output,  or  arithmetic.  Cut  and  input/output  are  especially 
relevant  as  they  have  been  used  to  define  the  interactive  problem  solvers  introduced  in 
Chapters  6  and  7.  Yet,  these  extra-logical  constructs  do  not  appear  in  the  explanation-based 
generalizations  we  have  illustrated  thus  far.  This  is  because  the  programs  that  include  these 
features,  for  example  the  interactive  tactical  of  §6.1,  are  boxed,  and  therefore  do  not  occur 
in  the  resulting  derived  rules.3 

Implementing  generalization  over  these  constructs  is  problematic  in  that  they  can  not  be 
realized  at  the  level  of  our  abstract  interpreter.  Essentially,  specials  are  handled  by  executing 
them  for  the  particular  case  (i.e.,  the  current  goal),  and  incorporating  an  analog  within  the 
associated  generalization.  Figure  9.1  illustrates  this  strategy  in  the  treatment  of  several 
pertinent  AProlog  specials  within  the  generalizing  meta-interpreter  of  §8.6. 

9.3.2  Higher-order  Constraints  and  EBG 

Just  as  for  rule,  the  higher-order  constraints  associated  with  an  explanation-based  general¬ 
ization  must  be  represented  in  the  ZMorm  assumped  by  rule_ebg,  and  as  you  would  expect, 
this  is  handled  in  an  identical  fashion. 

Complexity  of  higher-order  constraints.  However,  the  constraint  sets  that  result 
from  EBG  tend  to  be  substantially  more  complicated  than  those  associated  with  rule. 
Appendix  A. 7  gives  a  listing  of  the  most  complicated  set  of  constraints  we  have  yet  en¬ 
countered:  that  resulting  from  the  program  transformation  scenario  of  Chapter  7.  Even  to 
those  well-versed  in  AProlog,  these  constraints  are  inscrutable. 

For  more  complex  higher-order  generalizations  to  be  truly  useful,  methods  must  be  devised 
for  making  higher-order  constraints  more  palatable  to  the  programmer.  There  remains  the 

3 Nevertheless,  the  EBG  algorithm  as  we  have  defined  it  initially  includes  specials  that  occur  within  boxed 
clauses  in  DD,  its  accumulator  for  EBG  preconditions  (§8.6).  These  are  then  removed  in  the  course  of  the 
simplification  that  eliminates  necessarily  true  (boxed)  preconditions 
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possibility  (discussed  briefly  in  Chapters  3  &  10)  that  higher-order  unification  (and  hence 
EBG).  as  it  is  presently  defined,  is  too  general  to  be  of  value  for  more  complex  situations. 
Perhaps  a  more  restricted  unification  algorithm  could  effectively  capture  the  generalization 
without  yielding  an  inscrutable  result.  In  other  words,  higher-order  unification  may  itself  be 
such  a  rich  mechanism  that  explanation-based  generalizations  based  upon  it  are  potentially 
over-general.  For  now,  the  question  remains  open. 

Performance.  There  is  a  more  practical  concern  following  from  the  complexity  of  these 
higher-order  constraints,  and  that  is  performance.  Somehow  our  explicit  representation  of 
higher-order  constraints  as  A-term  unification  equations  can  lead  to  substantially  increased 
computation  costs,  again  for  our  more  complex  examples.  In  fact,  for  the  program  trans¬ 
formation  illustration  of  Chapter  7,  the  application  of  the  explanation-based  generalization 
takes  longer  than  its  generation!  Obviously  this  is  grossly  unacceptable  for  anything  more 
than  a  experimental  system,  since  it  violates  one  of  the  primary  missions  of  EBG  —  improv¬ 
ing  performance. 

There  are  a  couple  of  contributing  factors  to  this  performance  degradation.  One  expects 
that  the  computation  involved  in  applying  an  explanation-based  generalization  to  a  goal 
should  be  a  subset  of  that  required  to  solve  the  goal  without  generalization.  Given  this, 
the  observed  performance  loss  must  demonstrate  inappropriate  representation  choices  for 
the  derived  generalizations.  From  this,  it  is  easy  to  target  the  complex  constraint  equations 
associated  with  the  derived  rules  in  question.  We  believe  that  the  additional  overhead 
incurred  through  our  explicit  representation  of  constraints  as  A-terms  is  at  least  partially 
responsible.  In  particular,  this  inefficiency  may  be  the  result  of  information  lost  in  the 
copying  of  terms  (for  example,  once  identical  (“eq”)  terms  becoming  merely  equivalent). 
If  this  is  correct,  effective  performance  will  require  better  data  structures  for  maintaining 
constraints  across  A ° Prolog  computations. 

The  above  is  particularly  problematic  when  combined  with  a  limitation  of  eLP’s  present 
higher-order  unification  algorithm  —  it  can  behave  eagerly  in  its  commitment  to  particular 
solutions  [39].  When  the  unification  algorithm  makes  the  wrong  guess,  significant  amounts 
of  backtracking  is  required.  For  the  larger  constraint  sets,  this  cost  may  be  substantial. 
Instead,  what  one  would  like  is  maximally  lazy  unification. 

While  the  above  speculation  is  not  satisfying,  we  believe  that  the  exploration  of  higher-order 
constraint  representation  and  satisfaction  is  itself  a  substantial  research  problem.  Thus  far, 
our  efforts  have  focused  on  defining  the  language  and  learning  mechanisms  of  A°Prolog. 
We  anticipate  that  the  further  study  of  higher-order  constraints  and  unification  will  lead 
to  scrutable  encodings  and  fast  algorithms.  Indeed,  the  long  term  relevance  of  this  work  is 
dependent  upon  success  in  this  area. 
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Chapter  10 


Conclusion 


10.1  Summary 

As  stated  at  the  outset,  this  thesis  should  broadly  be  viewed  as  a  language  design  effort. 
The  result,  A£  Prolog,  encompasses  extensions  to  AProlog  that  afford  rudimentary  modal 
reasoning,  higher-order  EBG,  and  logically-motivated  constructs  for  controlling  generaliza¬ 
tion  and  assimilation.  By  incorporating  learning  mechanisms  within  the  logic  programming 
language,  A  “Prolog  defers  to  the  programmer  the  problem  of  determining  when  to  learn.  It 
is  our  belief  that  only  the  system  designer  is  generally  positioned  to  effectively  address  this 
problem,  leveraging  his  familiarity  with  both  the  problem  domain  and  the  problem  solver. 
Thus,  while  A^  Prolog  is  not  itself  a  learning  system,  it  is  intended  to  serve  as  a  high-level 
foundation  for  the  implementation  of  such  systems. 

Through  the  framework  of  A£  Prolog,  this  thesis  offers  a  number  of  contributions: 

•  The  use  of  the  modal  operator  □  to  provide  an  alternative  formulation  of  EBG. 

•  The  extension  of  the  EBG  algorithm  to  treat  a  higher-order  representation  language, 
and  then  formulation  of  higher-order  EBG  via  a  ADProlog  meta-interpreter. 

•  A  logically-sound  mechanism,  rule,  for  universal  generalization  within  logic  programs. 

•  A  generalization  of  the  rule  construct,  rule_ebg,  to  afford  EBG. 

•  The  integration  of  all  of  the  above  in  an  environment  that  supports  user-guided  problem 
solving  and  learning. 

•  A  prototype  implementation  DeLP. 

•  A  suite  of  examples. 

Just  as  this  thesis  borrows  from  a  number  of  different  areas,  we  believe  its  results  are  relevant 
to  a  range  of  research  efforts:  formal  methods  for  higher-order  domains  (e.g.,  program 
development,  theorem  proving,  and  natural  language);  AProlog  and  logic  programming; 
explanation-based  generalization  and  machine  learning;  modal  logic;  and  language  design  in 
general. 
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10.2  Future  Work 

10.2.1  Further  Experimentation  with  ADProlog 

While  we  have  presented  a  number  of  examples  in  the  course  of  developing  this  thesis,  the 
majority  of  our  efforts  have  been  devoted  to  constructing  the  A°Prolog  framework  rather 
than  exploring  its  application.  Thus,  one  of  the  primary  future  directions  is  further  exper¬ 
imentation.  AProlog  is  currently  being  used  as  a  research  vehicle  in  a  number  of  different 
areas:  logics,  programming  languages,  and  natural  language  [106,  88,  85,  103].  It  would  be 
interesting  to  further  consider  what  impact  ADProlog  can  have  on  these  domains,  particularly 
since  other  researchers  have  already  done  the  lion’s  share  of  the  necessary  formalization  by 
programming  in  AProlog.  A  particularly  attractive  domain  to  which  we  have  as  of  yet  given 
only  cursory  consideration  is  the  proofs- as-programs  paradigm  introduced  in  §2.1.  Like  the 
other  domains  considered  herein,  proofs-as-programs  is  interesting  to  consider  because  of  the 
relevance  of  higher-order  expressivity  and  the  reliance  upon  interactive  proof  development 

In  another  direction,  we  have  done  relatively  little  in  the  way  of  re-implementing  examples 
from  the  EBG  literature.  This,  of  course,  is  because  we  have  been  primarily  focused  upon 
the  higher-order  domains  that  initially  motivated  our  work.  However,  another  worthwhile 
means  of  exercising  ADProlog  would  be  to  consider  the  encoding  of  more  extensive  first-order 
problems.  Of  particular  interest  is  the  further  consideration  of  the  possible  interplay  between 
the  □  operator  and  dynamic  operationally  criteria. 

Of  the  many  remaining  questions,  perhaps  the  predominant  one  is  whether  a  relatively 
complete,  higher-order  apprentice  learning  system  can  be  effectively  realized  within  ADProlog 
(or  within  any  other  language,  for  that  matter).  While  we  have  provided  example  scenarios 
in  Chapters  6  and  7,  we  have  not  yet  produced  such  a  system.  This  is  in  part  due  to 
limitations  within  our  present  implementation,  both  in  terms  of  performance  (discussed  in 
Chapter  9),  and  in  terms  of  functionality,  such  as  the  lack  of  a  mouse  interface  (discussed 
in  Chapters  7).  This  is  also,  of  course,  due  to  time  constraints. 

10.2.2  Practical  Considerations 

In  Chapter  9  we  raised  a  number  of  limitations  that  presently  hinder  ADProlog  application 
to  more  complex  domains: 

•  Higher-level  support  for  interaction 

•  Higher-order  unification,  constraints,  and  lazy  unification 

Another  issue  is  that  of  instrumentation  for  performance  measurement  and  analysis:  while 
we  have  deferred  to  the  programmer  the  task  of  determining  when  EBG  is  invoked  and  how 
its  results  are  exploited,  ADProlog  does  not  provide  the  designer  with  the  facilities  required 
to  guide  this  process  and  ensure  that  performance  actually  improves.  While  we  have  not 
investigated  the  issue,  we  believe  that  such  tools  could  be  smoothly  integrated  with  A°Prolog. 
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10.2.3  Future  Work  on  AProlog  and  Explanation-Based  Gener¬ 
alization 

AProlog  and  EBG  are  fundamentally  melded  within  this  dissertation,  yet  each  continues  to 
evolve  through  current  research: 

For  AProlog,  among  other  efforts,  Miller  is  presently  considering  the  restriction  of  AProlog  to 
a  subset  L\  that  admits  decidable  unification  and  most  general  unifiers.  It  is  unclear  to  what 
extent  explanation-based  generalizations  of  L\  computation  are  themselves  legal  L\  clauses. 
There  remains  the  possibility,  however,  that  L\  s  restrictions  upon  higher-order  unification 
would  address  the  problems  associated  with  complex  higher-order  constraints  (see  §9.3.2). 

For  EBG,  in  addition  to  work  we  have  already  cited,  researchers  are  presently  extending  the 
paradigm  to  generalize  from  failure  as  well  as  success  [25],  and  to  more  effectively  generalize 
over  iterative  and  recursive  computations  [19, 118].  (To  illustrate  the  inadequacy  of  A£ Prolog 
in  this  latter  respect,  consider  from  Chapter  6  that  the  explanation-based  generalization  of 
the  repeat  tactical  commits  to  the  particular  number  of  iterations  used  within  the  example 
solution.) 

Even  more  speculatively,  and  of  particular  interest  to  us,  is  the  further  development  of  the 
‘language-based’  approach  to  learning  (of  which  A^1  Prolog  is  an  exemplar)  to  encompass  other 
EBG  methodologies,  other  paradigms  of  generalization  (e.g.,  similarity-based  methods  [65, 
12]),  and  analogical  problem  solving  techniques  [10,  15,  23,  57,  98]. 

10.2.4  Logical  Foundations  of  □  and  EBG 

While  we  have  defined  an  interpreter  for  AnProlog,  we  have  not  considered  the  underlying 
modal  semantics  of  our  intuitionistic  calculus.  What,  for  example,  is  its  relationship  to 
S4  and  S5  [16,  69]?  Also,  the  higher-order  EBG  algorithm  we  have  illustrated  should  be 
verified.  Finally,  it  is  worth  further  considering  the  formal  relationship  between  □,  EBG, 
and  operationality  criteria. 

10.2.5  Incorporating  “rule”  and  “rule.ebg”  within  other  Logic 
Programming  Languages 

On  an  alternative  front,  one  could  consider  the  incorporation  of  rule  (and  also  rule.ebg) 
within  Prolog  as  a  declarative  alternative  to  assert  and  retract.  Assuming  the  necessary 
syntax  to  express  variable  binding  (for  the  second  argument  of  rule),  rule  could  be  directly 
added  to  Prolog.  Presumably,  this  would  be  accomplished  without  the  addition  of  impli¬ 
cation  and  the  explicit  scope  K,  but  instead  by  extending  the  program  in  the  manner  of 
Prolog’s  lemma  (introduced  within  §5.2.1).  The  implementation  would  be  substantially 
easier  than  in  AProlog,  since  Prolog  programs  are  always  closed,  and  thus  we  can  sim¬ 
ply  quantify  over  all  logic  variables  (in  the  manner  of  assert)  without  having  to  check  for 
variables  free  in  current  assumptions.  The  implementation  problems  posed  by  rule  in  the 
context  of  Prolog  are  thus  very  similar  to  those  associated  with  assert,  and  variations  on 
the  techniques  proposed  by  Lindholm  &  O’Keefe  [79]  are  applicable. 
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10.2.6  Logical  Foundations  of  “rule”  and  “rule.ebg” 


Intuitively,  logic  programming  proofs  require  the  repeated  solution  of  instances  of  common 
goals.  More  formally,  the  execution  of  a  logic  program  usually  produces  what  is  known  as 
a  cut-free  or  normal  proof  of  the  query  (actually,  they  belong  to  the  even  more  restrictive 
class  of  uniform  proofs  [86]).  This  is  no  longer  true  for  a  language  extended  with  rule 
and  rule_ebg,  wherein  derived  proofs  may  be  substantially  shorter  than  non-normal  proofs 
(which  is  the  reason  why  rule  and  rule_ebg  are  effective).  Can  we  characterize  the  class  of 
deductions  which  can  be  found  by  programs  employing  rule?  By  those  using  rule.ebg?  Is 
there  a  way  to  extend  these  constructs  within  the  logic  programming  paradigm  so  that  even 
more  general  deductions  can  be  found? 

Finin,  et  al.  have  considered  a  more  complete  integration  of  forward  and  backward  chain¬ 
ing  [46].  Their  approach  supports  extended  computations  in  both  directions  by  allowing 
the  programmer  to  write  both  forward  and  backward  chaining  Horn  clauses.  We  do  not  see, 
however,  a  way  to  express  the  interplay  between  forward  and  backward  reasoning  required  by 
rule  within  their  language.  It  would  be  interesting  to  consider  whether  their  approach  could 
be  fruitfully  combined  with  the  higher-order  constructs  and  scoping  available  in  AProlog, 
and  then  also  with  rule  and  rule_ebg. 

10.2.7  Ramifications  of  “rule” 

Dale  Miller  has  pointed  out  that  rule  permits  the  formulation  of  at  least  the  extra-logical 
predicate  flexible  (flexible  is  the  higher-order  equivalent  of  var):  flexible  M  indicates 
that  M’s  head  is  a  variable  —  that  is,  that  M  can  be  unified  with  any  term.  He  provides 
the  definition 

flexible  M  <=  Vp,q.  (Vx.px)  =>  rule  (pM)  (Vx.px  =>  qx)  ( Vx.qx ) 

In  some  ways,  flexible’s  extra-logical  nature  makes  A£  Prolog  less  declarative: 

?  —  flexible  X,  X  =  a. 
succeeds,  while 

?  -  X  =  a,  flexible  X 
does  not. 


10.2.8  Alternatives  to  “rule”  and  “rule.ebg” 

There  are  a  few  examples  closely  related  to  the  ones  we  have  given  herein  for  which  rule 
and  rule.ebg  do  not  appear  to  be  powerful  enough.  The  problem  is  that  it  is  not  possi¬ 
ble  to  translate  the  universal  quantifiers  introduced  during  the  universal  generalization  step 
into  explicit  quantifiers  at  the  object-level.  That  is,  there  is  no  means  by  which  the  univer¬ 
sally  quantified  assumption  can  be  accessed  by  the  program.  Finding  a  declaratively  and 
operationally  satisfactory  solution  is  yet  another  topic  for  future  research. 


Appendix  A 

Some  Unabridged  Examples 


Within  this  Appendix  we  include  the  unabridged  AProlog  source  code  that  produced  many 
of  the  examples  upon  which  this  dissertation  has  so  heavily  relied. 

Notation.  Up  to  this  point,  I  have  made  a  substantial  effort  at  attractively  typeset  AProlog 
syntax.  However,  this  does  not  seem  to  be  a  worthwhile  endeavor  for  the  more  extended 
examples  to  follow.  Thus,  it  becomes  necessary  to  introduce  the  ASCII-restricted  syntax  of 
eLP  —  our  implementation  of  AProlog  described  in  Chapter  9. 

Instead  of  italics  and  boldface,  eLP  A-term  variables  begin  with  an  uppercase  character, 
while  A-term  constants  are  lowercase.  This  same  distinction  is  make  for  type  constants  and 
type  variables.  In  fact,  the  only  exception  is  that  variables  explicitly  bound  by  A  may  be  of 
either  case  (since  constants  may  never  follow  a  A). 

A-abstraction  is  represented  with  \,  which  binds  the  variable  preceding  it;  that  is,  \  acts  as 
an  infix  A,  such  as  within  x:A\y:B\(f  x  y),  which  is  equivalent  to  Az:  A.Xy:  B.ixy.  (This 
is  probably  the  most  difficult  aspect  of  eLP  syntax  to  get  used  to.) 

The  following  table  summarizes  the  mapping  from  to  ASCII: 


AProlog 

eLP 

c 

c 

X 

X 

— ► 

-> 

Ax. 

x\ 

Vx. 

pi  x\ 

3x. 

sigma  x\ 

□ 

box 

=> 

=> 

<*= 

<= 

x“ 

expn  x  a 

x/a 

x  div  a 
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A.l  Clause  simplification 


modul«  simplify . 


type 

false 

o. 

type 

simpl 

o  ->  o  - 

>  0. 

type 

siapll 

1 

o 

A 

1 

O 

>  0. 

simpl 

true 

true  :  -  ! . 

simpl 

(HI  ,  H2) 

H  !, 

simpl  HI  Hli,  siapl  H2  H2i, 
siapll  (Hli  ,  H2i)  H. 

siapl 

(HI  ;  H2) 

H  ! , 

simpl  HI  Hli,  simpl  H2  H2i, 
siapll  (Hli  ;  H2i)  H. 

simpl 

(HI  =>  H2) 

H 

simpl  HI  Hli,  simpl  H2  H2i, 
siapll  (Hli  =>  H2i)  H. 

siapl 

(pi  HI) 

H 

pi  x\(siapl  (HI  x)  (Hli  x)), 
siapll  (pi  Hli)  H. 

simpl 

(sigma  HI) 

H  ! . 

pi  x\(simpl  (HI  x)  (Hli  x)), 
siapll  (sigma  Hli)  B. 

simpl 

(box  HI) 

H  !. 

simpl  HI  Hli, 
siapll  (box  Hli)  H. 

simpl 

Ha 

Ha 

siapll 

true 

true 

- 

! . 

siapll 

(true  ,  H2) 

£2 

_ 

simpll 

(HI  ,  true) 

HI 

- 

siapll 

(false  ,  H2) 

false 

- 

! . 

siapll 

(HI  ,  false) 

false 

- 

siapll 

(true  ;  H2) 

true 

— 

simpll 

(HI  ;  true) 

true 

- 

i  # 

siapll 

(false  ;  H2) 

H2 

- 

i  # 

siapll 

(HI  ;  false) 

HI 

- 

1 1 

siapll 

(true  *>  H2) 

H2 

_ 

simpll 

(HI  =>  true) 

true 

- 

i . 

siapll 

(false  =>  H2) 

true 

- 

i . 

simpll 

(HI  =>  false) 

false 

- 

i . 

simpll 

(pi  X\  true) 

true 

_ 

i  # 

siapll 

(sigma  X\  true) 

true 

- 

i . 

siapll 

(pi  X\  false) 

false 

- 

i  # 

siapll 

(sigma  X\  false)  false 

- 

1  # 

siapll 

(box  true) 

true 

_ 

!. 

simpll 

(box  false) 

false 

- 

i  # 

siapll  B  H 
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A. 2  Rudimentary  Resolution  Theorem  Prover 

This  implementation  incorporates  a  unit  strategy  —  that  is,  use  smallest  clauses  first.  It 
relies  upon  user  interaction  to  control  the  whole  process. 


module  resolve.unit . 
import  simplify. 


type  clause. 

o  ->  int  ->  o. 

type  min.clause 

o  ->  int  ->  o. 

type  min.clause.aux 

o  ->  int  ->  int  -> 

type  resolve 

o 

V 

o 

1 

V 

o 

1 

V 

o 

type  rtp 

0. 

type  rtp.aux 

o  ->  int  ->  o  ->  o 

type  vrite.clauses 

0. 

resolve 

(P  ; 

Q) 

S 

(P  ;  R) 

resolve  q  S 

R. 

resolve 

(P  ; 

Q) 

S 

(q  ;  R) 

resolve  P  S 

R. 

resolve 

S 

(p  :  Q) 

(p  ;  R) 

resolve  q  S 

R. 

resolve 

S 

(p  ;  q> 

(q  ;  R> 

resolve  P  S 

R. 

resolve 

P 

(not  P) 

false. 

resolve 

(not 

P) 

P 

false. 

min.clause  P  P#  min.clause.aux  P  P#  1. 

min_clause_aux  P  P#  Size 

(clause.  P  Size,  P#  =  Size)  ; 

(Size  <  10,  Sizel  is  Size  +  1,  min.clause.aux  P  P#  Sizel). 


rtp  rule  (rtp.aux  R  R#  G) 

(pi  R\  (pi  R#\  (clause.  R  R#  rtp.aux  R  R#  G))) 
G. 


rtp.aux  R  R#  G 
min.clause  P  P#, 
min.clause  Q  q#. 


resolve  P  Q  Rl,  simpl  R1  R, 

R#  is  ((P#  +  Q#)  -  2), 

not  (clause.  R  R#),  */.'/.  This  is  a  hack, 

other. 


nl. 

nl,  vritesans 
nl,  vritesans 
nl,  vritesans 


"First  clause 
"Second  clause 
"Resolved  to 


",  vrite  P, 
",  vrite  Q, 
",  vrite  R, 


one  may  be  an  instance  of  the 
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nl,  writesans  "Assume  resolvant?  [y,r,q,t]  ", 

read  ans\  (  (ans  *  n,  fail)  ;  •/,•/,  no  cut  here,  we  want  to 

backtrack 

Cans  =  q,  ! ,  G  =  write_clauses)  ; 

(ans  =  t,  ,  G  =  top) 

(ana  =  y,  ! ,  G  =  rtp) ) . 


write_clauses 

nl,  nl,  writesans  "Resolution  complete", 
nl,  nl,  writesans  "Clauses:  ",  !, 

((min_clause  R  R#,  nl,  writesans  "  ",  write  R,  iail);nl). 


126 


aval 

true 

true 

! . 

aval 

(G1  .  G2) 

(G3  ,  G4) 

! ,  aval  Gl  G3, 
aval  G2  G4. 

aval 

(G1  ;  G2) 

G 

!,  (eval  Gl  G;  eval  G2  G) . 

aval 

(D  *>  G) 

G1 

!,  ndform  D  Dl,  hyp  D1  *>  aval  G  Gl 

aval 

(pi  G) 

(pi  Gl) 

!,  pi  X\  (aval  (G  X)  (Gl  X)). 

aval 

(sigma  G) 

(sigma  Gl) 

!,  aval  (G  X)  (Gl  X). 

eval  Gatom 

SubG 

hyp  D, 

select  D  (Gatom  SubGl), 

aval  SubGl  SubG. 

aval  Gatom 

SubG 

Gatom,  SubG  =  true,  1. 

eval  Gatom 

Gatom. 

Figure  A.l:  Evaluator. 


A. 3  Partial  evaluator  for  AProlog 

The  section  contains  a  more  thorough  development  of  the  partial  evaluator  peval  originally 
presented  within  §4.6.  We  also  provide  a  more  extensive  application  of  peval:  the  partial 
evaluation  of  a  interpreter  with  respect  to  a  particular  object  language. 


A. 3.1  A  AProlog  Evaluator 

To  simplify  the  presentation  of  the  partial  evaluator,  we  first  introduce  a  AProlog  evaluator 
given  in  Figure  A.l.  To  a  large  degree  this  evaluator  follows  both  the  original  presentation 
of  peval  and  the  AProlog  meta-interpreter  (§8.5).  We  include  it  nevertheless  for  the  sake  of 
completeness. 

The  predicate  eval  reduces  one  goal  (its  first  argument)  to  another  (its  second).1  In  the 
last  three  clauses  of  eval,  an  atomic  goal  is  reduced  either  (1)  by  applying  a  relevant  clause 
from  the  object-logic  program  V0b  (enumerated  via  hyp),  (2)  by  evaluating  it  directly  in 
AProlog,  or  (3)  by  a  no-op.  For  (1)  the  select  predicate,  the  code  for  which  may  be  found 
in  Figure  A. 2,  nondeterministically  selects  a  candidate  clause  to  reduce  the  goal  Ga.  select 
replaces  universal  variables  with  new  logic  variables  to  facilitate  the  unification  of  the  clause 
head  and  Ga.  For  (2)  we  avoid  coding  the  evaluation  of  special  goals  ( e.g .,  !,  =,  or  arithmetic) 
by  realizing  them  directly  (i.e.,  reflecting  them)  within  AProlog.  (To  be  practical,  the  above 
evaluator  should  also  simplify  the  result  of  evaluation  before  yielding  an  answer.) 


1While  we  could  accurately  use  the  term  ‘meta-evaluator’  (since  both  the  object-  and  meta-language  are 
AProlog),  it  would  become  cumbersome  when  we  turn  to  discussion  of  the  ‘partial-meta-evaluator.’ 
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sslact  Os 

0 

select.and  (01  ,  02) 

0  :  - 

select.and  01 

D 

selsct.pis  (pi  01) 

0 

selact.pis  0 

0. 

select_and  Os  D. 

!,  (select.and  D1  0; 

s elect _and  02  D) . 
select_pis  01  D. 

!,  sslect.pis  (01  X)  D. 


Figure  A. 2:  Clause  selection. 


A. 3.2  A  Partial  Evaluator 

The  partial  evaluator  of  Figure  A. 3  is  identical  to  the  evaluator,  except  that  rather  than 
directly  applying  clauses  (1)  or  directly  interpreting  goals  in  AProlog  (2),  peval  queries 
the  user  before  committing  to  any  such  operation;  that  is,  peval  is  a  user-guided  partial 
evaluator.  Those  queries  are  handled  by  the  auxiliary  predicates  given  in  Figure  A.4. 

This  brings  us  back  to  the  primary  distinction  between  PE  and  EBG:  partial  evaluation 
requires  substantial  amounts  of  search  control  in  order  to  produce  interesting  specializations. 
The  partial  evaluator  includes  no  notion  of  T>  and  T,  but  the  same  results  may  be  achieved 
by  explicitly  not  partially  evaluating  goals  reduced  by  specific  clauses.2  Deriving  all  the 
rules  that  can  be  produced  through  PE  is  equivalent  to  finding  the  deductive  closure  of  a 
logic  program.  (Of  course,  heuristics  can  potentially  reduce  this  problem  by  guiding  PE 
toward  more  interesting  specializations.)  EBG,  on  the  other  hand,  uses  an  example  solution 
(as  well  as  □  and  operationally  criteria)  to  determine  what  combination  of  clauses  will  in 
essence  be  partially  evaluated. 

If  the  partially  evaluated  logic  program  is  again  to  be  interpreted  ( e.g .,  through  eval  & 
peval),  the  following  top-level  is  sufficient:  (Recall  that  all  object  clauses  are  accessed 
through  the  predicate  hyp.) 

peval.top  E  (E  G)  : - 
rule  (peval  E  G) 

(pi  E\  (pi  G\  (peval  E  G  «>  hyp  (E  G)))) 
top. 

A.3.3  An  Example  Application 

As  a  more  extended  example  of  partial  evaluation  and  reflection,  consider  the  meta-interpreter 
of  Figure  A.5,  which  is  taken  from  Takeuchi  &  Furukawa  [127].  (The  concept  is  bor¬ 
rowed  from  Shapiro  [117].)  This  meta-interpreter  combines  uncertainty  or  confidence  fac¬ 
tors  with  its  solution  of  goals.  We  shall  apply  this  meta- interpreter  to  the  object-program 

2Actually,  this  is  not  quite  the  case,  since  PE  is  like  operationality  criteria  in  that  it  does  permit  internal 
proof  steps  to  be  abstracted  from  the  generalized  proof  (§4.5)  as  does  our  formulation  of  EBG. 
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peval  true 

true 

:-  ! . 

peval  (G1  ,  G2) 

(G3  ,  G4) 

:-  ! ,  peval  Gl  G3, 
peval  G2  G4. 

peval  (G1  ;  G2) 

(G3  ;  G4) 

:-  ! ,  peval  Gl  G3, 
peval  G2  G4. 

peval  (D  =>  G) 

G1 

:-  !,  ndform  D  Dl,  hyp  D1  =>  peval  G  Gl 

peval  (pi  G) 

(pi  Gl) 

:-  !,  pi  X\  (peval  (G  X)  (Gl  X)). 

peval  (sigma  G) 

(sigma  Gl) 

! ,  peval  (G  X)  (Gl  X). 

peval  Gatom 

SubG 

hyp  D, 

select  D  (Gatom  :-  SubGl), 
do_rule  Gatom  (Gatom  :~  SubGl)  SubG. 

peval  Gatom 

SubG 

:-  do_bottom  Gatom  SubG. 

peval  Gatom 

Gatom. 

Figure  A. 3:  Evaluator. 


do_rule  Gatom  (Gaton  SubGl)  SubG  nl, 

nl,  writeaana  "Goal  to  ba  partial  evaluated:  ”,  nl, 

writeaans  11  ”,  write  Gatom,  nl, 

nl,  writeaans  "Selected  rule:  ”,  nl, 

writeaana  "  ",  write  Gatom,  writeaans  "  :-  ",  nl, 

writeaans  "  ",  write  SubGl,  nl, 

nl,  writeaans  "Use  this  rule?  Cylnlsle]  ", 

read  ans\ 

((ana  =  a ,  !,  SubG  *  SubGl)  ; 

(ana  *  n,  ! ,  fail)  ; 

(ana  =  y,  !,  peval  SubGl  SubG)  ; 

(ana  =  e,  !,  aval  SubGl  SubG)  ; 

(!,  nl,  writeaana  "Illegal  command:  ",  write  ana,  nl, 
do_rule  Gatom  (Gatom  :-  SubGl)  SubG)). 

do_bottom  G  SubG  nl, 

nl,  writeaans  "Goal  to  be  partial  evaluated:  ",  nl, 
writeaans  "  ",  write  G,  nl, 
writeaana  "lo  more  rules  apply.",  nl, 

writeaana  "Evaluate  it  directly  in  eLP  (e.g.,  'is *,*<’)?  [yin] 
read  ans\ 

((ana  =  n,  !,  SubG  =  G)  ; 

(ana  =  y,  !,  G,  SubG  =  true)  ; 

(!,  nl,  writeaana  "Illegal  command:  ",  write  ana,  nl, 
do_bottom  G  SubG)). 


Figure  A. 4:  Partial  Evaluator. 
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module  meta.medic. 
import  lists. 


type 

msolve 

o  -> 

list  int  ->  o. 

typ« 

mrule 

o  -> 

o  ->  int  ->  o. 

typa 

eft 

int 

->  list  int  ->  int  ->  o. 

type 

product 

list 

int  ->  int  ->  int  ->  o. 

type 

cf .rule 

o  -> 

int  ->  o. 

msolve 

true  (100  : 

:  nil) . 

msolve 

(A  ,  B)  Z 

-  msolve  A  X,  msolve  B  Y,  append  X  Y  Z. 

msolve 

(not  A)  (Cf  : 

nil) 

-  msolve  A  (C  : :  nil),  C  <  20,  Cf  is  100 

msolve 

A  (Cf  : : 

nil) 

-  mrule  ABF,  msolve  B  S,  eft  F  S  Cf. 

eft  X  Y  Z  product  Y  100  Yl,  Z  is  ((X  *  Yl)  div  100). 
product  nil  A  A. 

product  (X  ::  L)  A  XI  B  is  ((X  *  A)  div  100),  product  L  B  XI. 

mruls  A  B  F  cf.rule  (A  B)  F. 

mruls  A  true  F  cl _ rule  A  F. 


Figure  A.5:  Meta-program  with  certainty  factors. 


of  Figure  A. 6,  which  is  concerned  with  the  prescription  of  drugs  (and  is  also  taken  from 
Takeuchi  k  Furukawa  [127]).  For  example,  eval  permits  the  solution  of  the  following  goal 

(  hyp  (cf.rule  (suffers.from  scott  peptic.ulcer  true)  0  true), 
hyp  (cf_rule  (complains. of  scott  pain  true)  100  true) 

)  *>  eval  (msolve  (should. take  scott  aspirin)  Cf)  G 

yielding  G  *  true  and  Cf  ■  (42  ::  nil). 

This  meta-interpreter  may  be  partially  evaluated  with  respect  to  this  object  program,  certain 
results  of  which  are  illustrated  within  Figures  A. 7  k  A.8.  (Each  of  these  derived  clauses  is 
reported  in  Takeuchi  k  Furukawa  [127].  We  duplicate  their  results  here  because  the  examples 
are  informative,  and  more  importantly,  that  the  results  illustrate  an  application  of  rule.) 

It  is  essential  to  understand  that  we  are  herein  dealing  with  three  levels  of  language:  peval, 
the  meta-interpreter  msolve,  and  the  medical  object  language.  One  problem  with  always 
interpreting  the  results  of  partial  evaluation  is  simply  the  extra  cost  incurred  through  this 
extensive  layering  of  language.3  As  we  have  discussed,  the  solution  is  to  reflect  the  program 
being  manipulated  (in  this  case,  the  combination  of  meta-  and  object-program)  into  the 
logic  program,  and  thereby  run  it  directly.  This  may  be  accomplished  via  a  previously  listed 
revision  to  our  top-level: 

3Thi8  factor  depends  upon  the  nature  of  the  interpreter  and  object  language,  but  between  one  and  two 
orders  of  magnitude  appears  to  be  typical  for  AProlog. 


130 


module  medic, 
import  meta_medic. 


kind  person 

type. 

kind  drug 

type. 

kind  symptom 

type. 

kind  condition 

type. 

type  should.take 

person  ->  drug  ->  o. 

type  complains_ol 

person  ->  symptom  -> 

type  suppresses 

drug  ->  symptom  ->  o. 

type  unsuitable 

drug  ->  person  ->  o. 

type  aggravates 

drug  ->  condition  -> 

type  suflers.froa 

person  ->  condition  - 

cf_rule  (should.take  Person  Drug  complains.ol  Person  Symptom, 

suppresses  Drug  Symptom, 
not  (unsuitable  Drug  Person)) 
TO. 

cl.rule  (suppresses  aspirin  pain) 

60. 

cf _rule  (suppresses  lomotil  diarrhoea) 

65. 


ci_rule  (unsuitable  Drug  Person  aggravates  Drug  Condition, 

suffers.from  Person  Condition) 

80. 

cf_rule  (aggravates  aspirin  peptic.ulcer) 

70. 

cf.rule  (aggravates  lomotil  impair ed_liver_f unction) 

70. 


Figure  A. 6:  Object-program. 
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msolve  (should.take  Y1  Y2)  <Y  ::  nil) 
msolve  (complains.of  Y1  Y3)  Y6, 
msolve  (suppresses  Y2  Y3)  Y9, 
msolve  (unsuitable  Y2  Yl)  (Yll  ::  nil), 

YU  <  20,  Y10  is  100  -  Yll, 

append  Y9  (Y10  ::  nil)  Y7,  append  Y6  Y7  Y4, 

eft  70  Y4  Y. 

msolve  (suppresses  aspirin  pain)  (60  ::  nil), 
msolve  (suppresses  lomotil  diarrhoea)  (65  : :  nil) . 
msolve  (unsuitable  Y10  Y12)  (Y4  ::  nil) 
msolve  (aggravates  Y10  Yll)  Y15, 
msolve  (suflersjtrom  Y12  Yll)  Y17, 
append  Y15  Y17  Y13,  eft  80  Y13  Y4. 
msolve  (aggravates  aspirin  peptic.ulcer)  (70  ::  nil), 
msolve  (aggravates  lomotil  impair ed.liver .function)  (70  ::  nil). 


Figure  A. 7:  Results  of  partial  evaluation. 


msolve  (should.take  Y7  aspirin)  (Y4  ::  nil) 
msolve  (complains.of  Y7  pain)  Y12, 
msolve  (suff ers.from  Y7  peptic.ulcer)  Y27, 
eft  80  (70  ::  Y27)  Y18, 

Y18  <  20,  Y17  is  100  -  Y18, 
append  Y12  (60  ::  Y17  ::  nil)  Y10, 
eft  70  Y10  Y4. 

msolve  (should.take  Y7  lomotil)  (Y  ::  nil) 
msolve  (complains. of  Y7  diarrhoea)  Y6, 
msolve  (suffers.from  Y7  impaired.liver.function)  M, 
eft  80  (70  ::  M)  Y15), 

Y15  <  20,  Y14  is  100  -  Y16, 

append  Y6  (65  ::  Y14  : :  nil)  Y4, 
eft  70  Y4  Y. 


Figure  A. 8:  Further  results  of  partial  evaluation. 


1 

i 
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peval_top  E  (E  G) 
rule  (peval  E  G) 

Cpi  E\  (pi  G\  (peval  E  G  *> 
top. 


(E  G)))) 
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A.4  Generalizing  interpreter  for  ADProlog 

For  the  sake  of  completeness,  we  list  the  unabridged  AProlog  implementation  discussed  in 
§8.5  &  §8.6. 


module  met&ebg. 

type  gsolve  o  ->  o  ->  o. 

type  hyp  o  ->  o. 

type  vsolve  o  ->  o  ->  o  ->  o. 

type  seolve  o  ->  o  ->  o  ->  o. 

type  vm&tch  o  ->  o  ->  o  ->  o  ->  o  ->  o. 

type  sm&tch  o  ->  o  ->  o  ->  o  ->  o  ->  o. 

type  bm&tch  o  ->  o  ->  o  ->  o  ->  o  ->  o. 

type  met&.esolve  o  ->  o  ->  o. 
type  meta.ssolve  o  ->  o  ->  o. 

type  breduce  o  ->  o  ->  o. 
type  reduce  o  ->  o  ->  o. 
type  reducel  o  ->  o  ->  o. 

type  dosolve  o  ->  o  ->  o  ->  o. 
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vsolve 

true 

true 

true 

! . 

vsolve 

(Gl  ,  G2) 

(GG1  ,  GG2) 

(DD1  ,  DD2) 

!,  vsolve  Gl  GG1  DD1, 
vsolve  G2  GG2  DD2. 

vsolve 

( G 1  ;  G2) 

(GG1  ;  GG2) 

DD 

!,  (vsolve  Gl  GG1  DD; 
vsolve  G2  GG2  DD) . 

vsolve 

(D  =>  G) 

GG 

DD 

!,  hyp  D  =>  vsolve  G  GG  DD. 

vsolve  (pi  G) 

(pi  GG) 

(DD  X) 

!,  pi  X\ 

(vsolve  (G  X)  (GG  X)  (DD  X)) 

» 

o 

< 

• 

(sigma  G) 

(sigma  GG) 

(DD  T) 

vsolve  (G  T)  (GG  T)  (DD  T) . 

vsolve 

(box  G) 

(box  GG) 

DD 

!,  ssolve  G  GG  DD. 

vsolve 

Ga 

GGa 

DD1 

ghyp  D  DD, 

vmatch  D  Ga  DD  GGa  MG, 
meta.vsolve  MG  DD1. 

vsolve 

Ga 

GGa 

(DD  ,  DD1) 

,  hyp  D, 

vmatch  D  Ga  DD  GGa  MG, 
meta.vsolve  MG  DD1. 

ssolve 

true 

true 

true 

:  -  ! . 

ssolve 

(Gl  .  G2) 

(GG1  .  GG2) 

(DD1  ,  DD2) 

!,  ssolve  Gl  GG1  DD1, 
ssolve  G2  GG2  DD2. 

ssolve 

(pi  G) 

(pi  GG) 

(DD  X) 

:-  !,  pi  X\ 

(ssolve  (G  X)  (GG  X)  (DD  X)) 

ssolve 

(box  G) 

(box  GG) 

DD 

!,  ssolve  G  GG  DD. 

• 

► 

H 

O 

m 

m 

Ga 

GGa 

DD1 

ghyp  D  DD, 

snatch  D  Ga  DD  GGa  MG, 
meta_vsolve  MG  DD1. 

ssolve 

Ga 

GGa 

(DO  ,  DD1) 

! ,  hyp  D, 

snatch  D  Ga  DD  GGa  MG, 
meta_vsolve  MG  DD1. 


ssolve  (G1  ;  G2)  (GG1  ;  GG2)  DD 

! ,  error  (vritesans  "Illegal  disjunction  in  boxed  goal"), 
ssolve  (D  =>  G)  (DD1  =>  GG)  DD2 

!,  error  (vritesans  "Illegal  implication  in  boxed  goal"), 
ssolve  (sigma  G)  (sigma  GG)  (DD  T) 

!,  error  (vritesans  "Illegal  existential  in  boxed  goal"). 
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match 

(D1  ,  D2) 

Ga 

DD 

GGa 

NG  !,  (match  D1  Ga  DD  GGa  NG; 

match  D2  Ga  DD  GGa  NG). 

match 

(G  =>  D) 

Ga 

(GG  =>  DD) 

GGa 

(gsolve  G  GG,  NG) 

!,  match  D  Ga  DD  GGa  NG. 

match 

(pi  D) 

Ga 

DD 

GGa 

NG  ! ,  match  (D  X)  Ga  DD  GGa  NG 

match 

(box  D) 

Ga 

(box  D) 

GGa 

NG  ! ,  bmatch  D  Ga  D  GGa  NG. 

match 

Ga 

Ga 

GGa 

GGa 

true. 

•match 

(D1  ,  D2) 

Ga 

DD 

GGa 

NG  !,  (smatch  D1  Ga  DD  GGa  NG; 

smatch  D2  Ga  DD  GGa  NG) . 

smatch 

(G  =>  D) 

Ga 

(GG  =>  DD) 

GGa 

(gsolve  G  GG,  NG) 

!,  smatch  D  Ga  DD  GGa  NG. 

smatch 

(pi  D) 

Ga 

DD 

GGa 

NG  !,  smatch  (D  X)  Ga  DD  GGa  NG 

smatch 

(box  D) 

Ga 

(box  D) 

GGa 

(box  NG) 

!,  bmatch  D  Ga  D  GGa  NG. 

bmatch 

(D1  .  D2) 

Ga 

(DD1  ,  DD2) 

GGa 

NG  ! ,  (bmatch  D1  Ga  DD1  GGa  NG; 

bmatch  D2  Ga  DD2  GGa  NG) 

bmatch 

(G  =>  D) 

Ga 

(GG  *>  DD) 

GGa 

(gsolve  G  GG,  NG) 

: -  ! ,  bmatch  D  Ga  DD  GGa  NG . 

bmatch 

(pi  D) 

Ga 

(pi  DD) 

GGa 

NG  !,  bmatch  (D  X:A)  Ga 

(DD  Y:A)  GGa  NG. 

bmatch 

(box  D) 

Ga 

(box  D) 

GGa 

NG  !,  bmatch  D  Ga  D  GGa  NG. 

bmatch 

Ga 

Ga 

GGa 

GGa 

true. 

match 

(D1  ;  D2) 

Ga 

DD  GGa  NG 

! ,  error 

(writesans  "Illegal  disjunction  in  program"). 

match 

(sigma  D) 

Ga 

DD  GGa  NG 

! _ 

! ,  error 

(writesans  "Illegal  existential  in  program"). 

smatch 

(D1  ;  D2) 

Ga 

DD  GGa  NG 

! ,  error 

(writesans  "Illegal  disjunction  in  program”). 

smatch 

(sigma  D) 

Ga 

DD  GGa 

NG 

I  ” 

! ,  error 

(writesans  "Illegal  existential  in  program"). 

bmatch 

(D1  ;  D2) 

Ga 

DD  GGa  NG 

! ,  error 

(writesans  "Illegal  disjunction  in  program"). 

bmatch 

(sigma  D) 

Ga 

DD  GGa  NG 

!,  error  (writesans  "Illegal  existential  in  program"). 
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y.  Meta-interpreter  invocation. 


dosolve  G  GG  DD  !,  wsolve  G  GG  DD1,  breduce  DD1  DD2,  reduce  DD2  DD. 


'/.  Solution  of  accumulated  Meta-goals. 


meta_vsolve 

true 

true 

meta_wsolve 

(MG1  ,  MG2) 

(DD1  ,  DD2) 

meta.wsolve 

(box  MG) 

DD 

meta_wsolve 

(gsolve  G  GG) 

DD 

meta_ssolve 

true 

true 

meta_Bsolve 

(MG1  ,  MG2) 

(DD1  ,  DD2) 

aeta.ssolve 

(box  MG) 

DD 

meta_ssolve 

(gsolve  G  GG) 

DD 

!,  meta.vsolve  MG1  DD1 , 
meta_wsolve  MG2  DD2. 
!.  meta.ssolve  MG  DD. 

! .  wsolve  G  GG  DD . 


! . 

! >  *eta_ssolve  MG1  DD1 , 
meta.ssolve  MG2  DD2. 
!,  meta.ssolve  MG  DD. 

!,  ssolve  G  GG  DD. 


V,  Replaces  "(box  H)"  with  "true"  in  DD - the  set  of  sufficient 

%  conditions. 


breduce  true  true  :  -  ! . 

V,  Above  should  be  first  to  avoid  infinite  recursion  on  logical  variables. 
'/,  This  allows  uninstaniated  variables  to  be  'reduced’  out  of  the  picture. 
'/•  (Good  for  all  but  degenerate  higher-order  generalizations.) 


breduce  (HI  ,  H2)  (Hli  ,  H2i)  :- 

breduce  (HI  ;  H2)  (Hli  ;  H2i)  :- 

breduce  (HI  ->  H2)  (Hli  =>  H2i)  :- 
breduce  (pi  H)  (pi  Hi)  : - 

breduce  (sigma  H)  (sigma  Hi)  :- 

breduce  (box  H)  true 

breduce  Ha  Ha 


breduce  HI  Hli,  breduce  H2  H2i. 
breduce  HI  Hli,  breduce  H2  H2i. 
breduce  HI  Hli,  breduce  H2  H2i. 
pi  X\  (breduce  (H  X)  (Hi  X)). 
pi  X\  (breduce  (H  X)  (Hi  X)). 


V.  Simplifies  sufficient  conditions  by  removing  superfluous  true's. 

reduce  true  true  : -  ! . 

%  Above  should  be  first  to  avoid  infinite  recursion  on  logical  variables. 
*/.  This  allows  uninstaniated  variables  to  be  'reduced*  out  of  the  picture. 
’/.  (Good  for  all  but  degenerate  higher-order  generalizations.) 


reduce 

(HI  ,  H2) 

H 

i, 

reduce  HI  Hli,  reduce  H2  H2i, 

reduce 

(HI  ;  H2) 

H 

•  —  1 

reducel  (Hli  ,  H2i)  H. 
reduce  HI  Hli,  reduce  H2  H2i, 

reduce 

(HI  =>  H2) 

H 

•  _  1 

reducel  (Hli  ;  H2i)  H. 
reduce  Hi  Hli,  reduce  H2  H2i, 

reduce 

(pi  HI) 

H 

•  _  | 

reducel  (Hli  =>  H2i)  H. 

pi  X\  (reduce  (HI  X)  (Hli  X)), 

reduce 

(sigma  HI) 

H 

•  _  | 

reducel  (pi  Hli)  H. 

pi  X\  (reduce  (HI  X)  (Hli  X)), 

reduce 

(box  HI) 

H 

•  mm  | 

reducel  (sigma  Hli)  H. 
reduce  HI  Hli, 

reduce 

Ha 

Ha 

:-  ! . 

reducel  (box  Hli)  H. 

reducel 

true 

true 

- 

J  . 

reducel 

(true  ,  H2) 

H2 

- 

1  , 

reducel 

(HI  ,  true) 

HI 

- 

1  . 

reducel 

(true  ;  H2) 

true 

- 

!  . 

reducel 

(HI  ;  true) 

true 

- 

f . 

reducel 

(true  =>  H2) 

H2 

- 

i  # 

reducel 

(HI  =>  true) 

true 

- 

i  t 

reducel 

(pi  X\  true) 

true 

- 

i . 

reducel 

(sigma  X\  true) 

true 

- 

i  , 

reducel 

(box  true) 

true 

- 

i  # 

reducel 

H 

H 

- 

i  # 
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A. 5  Tactic-style  Integration 

A.5.1  Tactics  for  Integration 


module 

integrate.tac . 

import  tacticals . 

type 

expn 

int  ->  int  ->  int. 

type 

m 

int  ->  int. 

type 

log 

int  ->  int. 

type 

cos 

int  ->  int. 

type 

sin 

int  ->  int. 

type 

intgr 

(int  ->  int)  ->  (int  ->  int)  ->  o 

type 

dx 

name. 

type 

cl 

name. 

type 

Pi 

name. 

type 

pal 

name. 

type 

pw 

name. 

type 

ell 

name. 

type 

clr 

name. 

type 

Pi 

name. 

type 

cos.. 

name. 

! !  tac  dx 

(intgr 

x\l 

x\x) 

true. 

! !  tac  cl 

(intgr 

x\A 

x\(A  *  x)) 

trne. 

! !  tac  pi 

(intgr 

x\x 

x\((expn  x  2)  div  2)) 

trne. 

! !  tac  pml 

(intgr 

x\(expn  x  ("  1)) 

x\(log  x)) 

true. 

! !  tac  pw 

(intgr 

x\(expn  x  A) 

x\((expn  x  (A  +  1))  div  (A  +  1))) 

trne. 

! !  tac  ell 

(intgr 

x\(A  *  (B  x)) 

x\(A  *  (Bi  x))) 

(intgr 

x\(B  x) 

x\(Bi  x)). 

! !  tac  clr 

(intgr 

x\((B  x)  *  A) 

x\((Bi  x)  *  A)) 

(intgr 

x\(B  x) 

x\(Bi  x)). 

! !  tac  pi 

(intgr 

x\((A  x)  +  (B  x)) 

x\((Ai  x)  +  (Bi  x))) 

((intgr  x\(A  x) 

x\(Ai  x))  , 

(intgr  x\(B  x)  x\(Bi  x))). 


t&c  cos.  (intgr  cos 
trus. 


sin) 
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A. 5. 2  Tacticals 


module  tacticals. 
import  simplify. 


kind 

name 

type. 

type 

tac 

name  ->  o  ->  o 

index 

tac 

1. 

type 

maptac 

name  ->  name. 

type 

then 

name  ->  name  - 

type 

orelse 

name  ->  name  - 

type 

repeat 

name  ->  name. 

type 

idtac 

name. 

type 

try 

name  ->  name. 

type 

complete 

name  ->  name. 

type 

quit 

name. 

type 

stop 

name  ->  name. 

type 

int  eract _ solve 

name. 

! !  tac  (maptac  T)  true  true . 

!!  tac  (maptac  T)  (OGa  ,  OGb)  OG  ! , 
tac  (maptac  T)  OGa  OGal, 
tac  (maptac  T)  OGb  OGbl, 
simpl  (OGal  ,  OGbl)  OG. 

H  tac  (maptac  T)  (OGa  ;  OGb)  OG  !, 
tac  (maptac  T)  OGa  OGal, 
tac  (maptac  T)  OGb  OGbl, 
simpl  (OGal  ;  OGbl)  OG. 

H  tac  (maptac  T)  IG  OG 
tac  T  IG  0G1 , 
simpl  0G1  OG. 


->  o. 


>  name. 

>  name. 
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! !  stop_tac  ok. 

! !  stop.tac  pop_ . 


!!  tac  top.solve  (pi  0G1)  OG  !,  pi  X\(tac  top.solve  (0G1  X)  (0G2  X)), 

simpl  (pi  0G2)  OG. 

!!  tac  top.solve  (0G1  ,  0G2)  OG  !,  tac  top_solve  0G1  0G3, 

tac  top.solve  0G2  0G4, 
simpl  (0G3  ,  0G4)  OG. 

!!  tac  top.solve  0G1  OG  !,  tac  interact.solve  0G1  OG. 


•  '•  tac  interact .solve  true  true  :  -  !, 

nl,  uritesa ns  " -  Subtree  solved  . ..",  nl,  nl. 

!!  tac  interact.solve  0G1  OG  ! , 

nl,  writasans  "Goal  to  ba  radtcad:  ",  nl,  nl, 
write  OG1,  nl, 

nl,  writesans  "Enter  rule  ", 

read  T\(tac  T  0G1  0G2,  sisipl  0G2  0G3, 

((stop_tac  T,  OG  *  0G3) ; 

(tac  top.solve  0G3  OG)). 


H  tac  direct  (pi  0G1)  OG  :-  !,  pi  X\(tac  direct  (0G1  X)  (0G2  X)), 

simpl  (pi  0G2)  OG. 

M  tac  direct  (0G1  ,  0G2)  OG  :-  !,  tac  direct  DG1  0G3, 

tac  direct  0G2  0G4, 
simpl  (0G3  ,  0G4)  OG. 
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! !  t&c  direct 


0G1 


i 

•  9 


OG  tac  Kama  0G1  0G2,  simpl  0G2  0G3, 

(  (0G3  =  true,  ! ,  OG  =  true) 
;(tac  direct  0G3  OG,  !) 

;0G3  =  OG 

). 
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module  tail.rec.itac. 
import  tactical.iebg. 

%  Transformational  program  development  mapping  recursive  applicative 
'/.  functions  (of  a  certain  form)  to  tail-recursive.  Illustrates 
%  higher-order  EBG . 

'/.  This  variation  is  designed  to  work  with  interactive  generalizing 
%  tactics. 


'/,  Scott  Dietzen,  1990 

kind 

exp 

type. 

type 

ife 

exp  ->  exp  ->  exp  ->  exp. 

type 

not_ 

exp  ->  exp. 

type 

lam 

(exp  ->  exp)  ->  exp. 

type 

appl 

exp  ->  exp  ->  exp. 

type 

fix 

(exp  ->  exp)  ->  exp. 

type 

equals 

exp  ->  exp  ->  exp. 

type 

times 

exp  ->  exp  ->  exp. 

type 

minus 

exp  ->  exp  ->  exp. 

type 

nil_ 

exp. 

type 

cons. 

exp  ->  exp  ->  exp. 

type 

null 

exp  ->  exp. 

type 

car 

exp  ->  exp. 

type 

cdr 

exp  ->  exp. 

type 

append 

exp  ->  exp  ->  exp. 

type 

zero 

exp. 

type 

one 

exp. 

type 

associative 

(exp  ->  exp  ->  exp)  -> 

0. 

type 

commutative 

(exp  ->  exp  ->  exp)  -> 

0. 

type 

left.identity 

(exp  ->  exp  ->  exp)  -> 

exp  ->  o 

type 

right.identity 

(exp  ->  exp  ->  exp)  -> 

exp  ->  o 

type 

getrev 

exp  ->  o 

type 

getdiff 

exp  ->  o 

type 

getfact 

exp  ->  o 

type 

insert.lam 

(exp  ->  exp)  ->  name. 

type 

add.oper.rid 

(exp  ->  exp  ->  exp)  -> 

((exp  ->  exp)  ->  exp)  ->  name. 

type 

abstract.arg 

(exp  ->  exp)  ->  (exp  ->  exp)  ->  name. 

type 

name.fn 

exp  ->  (exp  ->  exp)  ->  name. 

type 

unfold 

(exp  ->  exp)  ->  name. 

type 

reduce.l 

((exp  ->  exp)  ->  exp)  ->  name. 

type 

dist.if e_2 

(exp  ->  exp  ->  exp)  -> 

((exp  ->  exp  ->  exp)  ->  exp)  ->  name 

type 

left_id_2 

(exp  ->  exp  ->  exp)  -> 
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type  assoc. 2 


((exp  ->  exp  ->  exp)  ->  exp)  ->  name, 
(exp  ->  exp  ->  exp)  -> 

((exp  ->  exp  ->  exp)  ->  exp)  ->  name, 
type  fold_tso_3  ((exp  ->  exp  ->  exp  ->  exp)  ->  exp)  -> 

(exp  ->  exp  ->  exp  ->  exp)  -> 
exp  ->  naae. 

right .identity  minus  zero. 

associative  times, 
commutative  times, 
left.identity  times  one. 
right.identity  times  one. 

associative  append, 
left.identity  append  nil_. 
right.identity  append  nil.. 


getlact  (fix  Fact\  (lam  V\ 

(ife  (equals  I  zero)  one  (times  (appl  Fact  (minus  I  one))  K) ) ) ) . 

getrev  (fix  Rev\  (lam  L\ 

(ife  (null  L)  nil. 

(append  (appl  Rev  (cdr  L))  (cons,  (car  L)  nil.))))), 
getdiff  (fix  Diff\  (lam  L\ 

(ife  (null  L)  zero  (minus  (car  L)  (appl  Diff  (cdr  L)))))). 
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! !  tac  (insert.lam  C) 

(C  (fix  f\(lam  n\(G  f  n)))) 

(C  (lam  m\(appl  (fix  f\(lam  n\(G  f  n)))  m))). 


! !  tac  (add_oper_rid  Op  C) 

(C  x\(G  x)) 

(C  x\(Op  (G  x)  A))  right. identity  Op  A. 


! !  tac  (abstract.arg  Cl  C2) 

(Cl  (C2  A)) 

(Cl  (appl  (lam  m\(C2  m))  A)). 


! !  tac  (name.fn  Fnew  C) 

(C  G) 

(C  (fix  fnew\G))  Fnew  =  (fix  fnew\G) . 


! !  tac  (tinfold  C) 

(C  (fix  f\(G  f))) 

(C  (G  (fix  f\(G  f)))). 


! !  tac  (rednce.l  C) 

(C  x\(appl  (lam  n\(G  n))  x)) 
(C  x\(G  x)). 


! !  tac  (dist_ife_2  Op  C) 

(C  x\y\(Op  (if a  (Bool  x  y)  (El  x  y)  (E2  x  y))  (H  x  y))) 
(C  x\y\(ife  (Bool  x  y)  (Op  (El  x  y)  (H  x  y)) 

(Op  (E2  x  y)  (H  x  y)))). 


! !  tac  (left_id_2  Op  C) 

(C  x\y\(Op  A  (H  x  y)» 

(C  x\y\(H  x  y))  left.identity  Op  A. 


! !  tac  (assoc_2  Op  C) 

(C  x\y\(Op  (Op  (HI  x  y)  (H2  x  y))  (H3  x  y))) 

(C  x\y\(Op  (HI  x  y)  (Op  (H2  x  y)  (H3  x  y))))  associative  Op. 


!!  tac  (fold_two_3  Cl  C2  (fix  f\(lam  m\(lam  n\(C2  F  n  m)))))  */.  no  occur  f 
(Cl  f\x\y\(C2  F  (HI  x  y)  (H2  x  y)))  */.  no  occur  f 

(Cl  f\x\y\(appl  (appl  f  (H2  x  y))  (HI  x  y))). 

Cl  —  context  within  input  program  to  be  replaced. 

C2  —  matches  some  function  F  in  both  the  original  definition 
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xxxmmxx  The  invocation: 

%  get fact  F,  tac  top_solve  F  Font. 


XXXXXXXXXXX  The  inputs  as  prompted 


%  1.  insert _laa 

G\G. 

2.  add_oper_rid  times  G\(lam  n\(G  n)). 

'U  3 .  abstract_arg 

G\G 

G\(lam  n\(times  (WO  n)  G)). 

'/.  4.  name.fn 

Fnew  G\(appl  G  W) . 

*/.  5.  unfold 

G\(appl  (fix  fl\(lam  m\(lam  n\ 

(times  (appl  G  n)  m))))  W) . 

'/.  6.  reduce.  1 

G\(appl  (fix  fl\(lam  m\(lam  n\ 

(times  (G  n)  m)))>  W) . 

’/.  7.  dist_ife_2 

times  G\(appl  (fix  fl\(lam  m\(lam  n\(G  m  n))))  W) 

*/.  8.  left_id_2 

times  G\(appl  (fix  f l\(lam  m\(lam  n\ 

(ife  (W1  m  n)  (G  m  n)  (W3  m  n)))))  W). 

'/.  9.  asaoc.2 

times  G\(appl  (fix  fl\(lam  m\(lam  n\ 

(ife  (Ml  m  n)  (W2  m  n)  (G  m  n)))))  W) . 

'/.to.  fold_two_3 

y,U.  ok. 

G\(appl  (fix  fl\(lam  m\(lam  n\ 

(ife  (W1  m  n)  (W2  m  n)  (G  fl  m  n) ) ) ) ) 
G\Hl\H2\(times  (appl  G  HI)  H2) 

Fnew. 
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A. 7  Constraints 


We  illustrate  the  nature  of  the  higher-order  constraints  with  the  set  of  constraint  equations 
coming  out  the  program  transformation  example  introduced  in  Chapter  7. 

C 

"  <0p  (appl  G1  (F101  m  11))  (F10  m  11)  ==  F9  m  11> 

<0p  (appl  G1  n)  h2  «»  F3  h2  n> 

<F9  x  xl  ==  Op  (HI  x  xl)  (Op  (H2  x  xl)  (H3  x  xl))> 
i  <0p  (Op  (HI  ml  111)  (H2  ml  111))  (H3  ml  111)  ==  F71  ml  lli> 

<F6  x2  x3  ==  Op  (G2  x3)  x2> 

<0p  (appl  (lam  I  \  (G2  I))  x4)  x5  ==  FS  x5  x4> 

<0p  (lam  I  \  (G3  m2  I))  (H  m2)  == 

Op  (lam  HI  \  (appl  (lix  F  \  (lam  H  \  (G  F  I)))  HI))  m2> 

<F3  nl  m3  ==  Op  (G3  nl  m3)  (H  nl)> 

<0p  (appl  (lix  F  \  (G4  F))  n2)  m4  ==  F3  m4  n2> 

<F5  x6  x7  ==  Op  (appl  (G4  (lix  F  \  (G4  F)))  x7)  x6> 

<0p  (ile  (F7  m5  112)  (El  m5  112)  (E2  m5  112))  (H4  m5  112)  ==  F6  m5  112> 

<0p  DD  (F8  m6  113)  ==  Op  (El  m6  113)  (H4  m6  113)> 

<F71  m7  114  ==  Op  (E2  m7  114)  (H4  m7  114)> 

] 

Prelix  Fragment : 

[sigma  DD  F10  F101  F3  F5  F6  F7  F71  F8  F9  G  Op] [sigma  G3  H] (pi  m2  nl  m3) 

[sigma  G4] (pi  m4  n2  x6  x7) [sigma  G2] (pi  x5  x4  x2  x3) [sigma  El  E2  H4] 

(pi  m5  112  m7  114  m6  113) [sigma  HI  H2  H3] (pi  ml  111  x  xl) [sigma  Gl] 

(pi  h2  n  m  11) 
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